Neuronal apoptosis inhibitor protein, gene sequence and mutations causative of spinal muscular atrophy

ABSTRACT

The gene for the autosomal recessive neurodegenerative disorder Spinal Muscular Atrophy has been mapped to a region of chromosome 5. The gene encodes a protein having homology with apoptosis inhibitor proteins of viruses so that the encoded protein has been labelled as a neuronal apoptosis inhibitor protein (NAIP). A deletion in the (NAIP) domain was identified in persons with Type I, II and III Spinal Muscular Atrophy (SMA) and not in the normal non-SMA population.

This application is a 371 of PCT/CA95/00581, filed Oct. 17, 1995, whichclaims priority to UK 9421019.2, filed Oct. 18, 1994 and CA 2,138,425,filed Dec. 29, 1994.

FIELD OF THE INVENTION

The gene for the neuronal apoptosis inhibitor protein (NAIP) has beenidentified in the q13 region of chromosome 5. Mutations in this genehave been diagnosed in individuals with Type I, II and III SpinalMuscular Atrophy. The amino acid sequence of the neuronal apoptosisinhibitor protein is provided and homology to viral apoptosis proteinsdemonstrated.

BACKGROUND OF THE INVENTION

In order to facilitate reference to various journal articles in thediscussion of various aspects of this invention, a complete listing ofthe reference is provided at the end of the disclosure. Otherwise thereferences are identified in the disclosure by first author's name andpublication year of the reference.

The childhood spinal muscular atrophies (SMAs) are a group of autosomalrecessive, neurodegenerative disorders classified into three types basedupon the age of onset and clinical progression (Dubowitz et al., 1978;Dubowitz et al., 1991). All three types are characterized by thedegeneration of the alpha motor neurons of the spinal cord manifestingas weakness and wasting of the proximal voluntary muscles. Type I SMA isthe most severe form with onset either in utero or within the first fewmonths of life. Affected children are unable to sit unsupported and areprone to recurrent chest infections due to respiratory insufficiency,thus rarely surviving the first few years of life (Dubowitz et al.,1978; Dubowitz et al., 1991). This acute form, with a carrier frequencyof 1/60 to 1/80, is one of the most frequent fatal autosomal recessivedisorders. Affected children with Type II SMA never walk unaided andalthough the prognosis is variable, such children may die inadolescence. Those affected with Type III SMA maintain independentambulation but develop weakness any time between the age of 3 to 17years manifesting a mildly progressive course (Dubowitz et al., 1978;Dubowitz et al., 1991).

In 1990, all three childhood forms of SMA were genetically mapped to thelong arm of chromosome 5 at 5q11.2-13.3 (Brustowitcz et al., 1990;Gilliam el al., 1990; Melki el al., 1990). Subsequent multi-pointlinkage analyses and the identification of recombinant events havefurther localized the genetic defect to the region flankedcentromerically by D5S435/D5S629 (Soares et al., 1993; Wirth et al.,1993, Clermont et al., 1994)) and telomerically by MAP1B/D5S112 (Wirthet al., 1994; MacKenzie et al., 1993; Lien et al., 1991). This intervalhas been refined by the more recent identification of recombinationevents indicating that the SMA gene lies distal to CMS-1 (Yaraghi etal., submitted to Human Genetics; van der Steege, et al., submitted toHuman Genetics) and proximal to D5S557 (Francis et al., 1993). We andothers have detected chromosome 5-specific repetitive sequences withparticular abundance in the D5S629/CMS-D5S557 region (Francis et al.,1993; Thompson et al., 1993) which has impeded the isolation andordering of both clones and simple tandem repeats. An array of cosmidclones spanning the 200 kb CMS-1 (Kleyn et al., 1993)/CATT-1 (Burghes etal., 1994, McLean et al., in press)/D5F150/D5F149/D5F153 (Melki et al.,1994) region within this interval has been constructed.

We established a contiguous array of YAC clones encompassing the SMAcontaining D5S435-D5S112 interval of 5q13.1. We then discovered a genewithin this interval of 5q13.1 which coded for a neuronal apoptosisinhibitor protein (NAIP). Further studies demonstrated that a deletionin this gene was found in Type I, II and III Spinal Muscular Atrophy.

SUMMARY OF THE INVENTION

A gene encoding a neuronal apoptosis inhibitor protein (NAIP) wasdiscovered in the q13 region of human chromosome. According to an aspectof the invention, the cDNA sequence coding of the neuronal apoptosisinhibitor protein is provided and set out in Table 4 (SEQ ID NO: 1).According to another aspect of the invention, the predicted amino acidsequence of the neuronal apoptosis inhibitor protein is provided fromthe cDNA sequence.

According to another aspect of the invention, a deletion of the neuronalapoptosis inhibitor protein gene was discovered in persons with Type I,II and III Spinal Muscular Atrophy disease. The discovery of theneuronal apoptosis inhibitor protein gene deletion provides a diagnosticindicator for use in the diagnosis of Spinal Muscular Atrophy.

In order to facilitate a further description of various aspects of theinvention, reference will be made to various Figures of the drawings. Abrief description of the drawings follows this invention summarysection.

According to a further aspect of the invention, a human gene is providedwhich maps to the SMA containing region of chromosome 5q13. The genecomprises exons 1 through 17 of approximately 5.5 kb and having arestriction map for exons 2 through 11, as shown in FIG. 8.

According to a further aspect of the invention, exons 1 through 17 havea restriction map for exons 2 through 16, as shown in FIG. 9D.

According to another aspect of the invention, a human gene of the aboveaspects wherein exons 5 through 16 code for the NAIP protein having anamino acid sequence biologically functionally equivalent to the aminoacid sequence of (SEQ ID NO: 2).

According to another aspect of the invention, the human gene of theabove aspects have exons 5 through 16 with a cDNA sequence biologicallyfunctionally equivalent to the cDNA sequence of (SEQ ID NO: 1).

According to another aspect of the invention, a purified nucleotidesequence comprises genetic DNA, cDNA, mRNA, anti-sense DNA or homologousDNA corresponding to the cDNA sequence of SEQ ID NO: 1.

According to another aspect of the invention, a DNA molecule sequencecoding for the NAIP protein having SEQ ID NO: 2.

According to another aspect of the invention, a purified DNA sequenceconsists essentially of DNA SEQ ID NO: 1.

According to another aspect of the invention, a purified DNA sequenceconsists essentially of a DNA sequence coding for amino acid SEQ ID NO:2.

According to another aspect of the invention, a purified DNA sequencecomprises at least 18 sequential base of SEQ ID NO: 1. DNA probes, PCRprimers, DNA hybridization molecules and the like may be provided byusing the purified DNA sequence of at least 18 sequential bases.

According to another aspect of the invention, use of the DNA sequencesof the above aspects in the construction of a cloning vector or anexpression vector.

According to another aspect of the invention, NAIP protein encoded bythe above DNA sequences.

According to another aspect of the invention, NAIP protein comprising anamino acid sequence biologically equivalent to the amino acid sequenceof SEQ ID NO: 2.

According to another aspect of the invention, NAIP protein consistingessentially of the amino acid sequence of SEQ ID NO: 2.

According to another aspect of the invention, NAIP protein fragmentcomprises at least 15 sequential amino acids of SEQ ID NO: 2.

According to another aspect of the invention, use of the above aminoacid sequences in the production of hybridomas.

According to another aspect of the invention, a method is provided foranalyzing a biological sample to determine the presence or absence of agene encoding NAIP protein.

The method comprises:

i) providing a biological sample derived from the SMA containing regionq13 of chromosome 5;

ii) conducting a biological assay to determine presence or absence inthe biological sample of at least a member selected from the groupconsisting of:

a) NAIP DNA SEQ ID NO: 1, and

b) NAIP protein SEQ ID NO: 2.

DESCRIPTION OF DRAWINGS

The original numbering of exons for the NAIP gene begin with exon 0 andprogressed through exon 16. This is identified in drawings as sequencenumbering Scheme #1. However, for conventional exon numbering, it ispreferable to begin with exon 1 and progress through to exon 17. This isnow identified as sequence numbering Scheme #2.

FIG. 1: YAC contiguous assay of the SMA gene region. YACs arerepresented by solid lines. Open triangles represent polymorphic STRS,solid triangles represent STSS, open squares represent single copyprobes. The genetically defined SMA interval, CMS-1-SMA-D5S557 and theprevious D5S629-SMA-D5S557 interval, are indicated above the YACS.

FIG. 2: Long range restriction map of the SMA region. Rare cutter sitesare indicated above the solid line. A minimal set of markers areindicated below the solid line t corresponds to the pYAC4 tryptophan orleft end. u corresponds to the pYAC4 uracil or right end. Thegenetically defined CMS-1-SMA-D5S557 and the D5S629-SMA-D5S557 intervalare estimated at 550 kb and 1.1 Mb respectively.

FIGS. 3A and 3B: Amplification of the CATT-I locus. Allele sizes areshown below each lane. (A) Amplification of YACS. G: genomic DNA. (B)Amplification of cosmids derived from the chromosome 5 flow sortedlibrary. The 4 distinct alleles are represented by cosmids 40G1 (allele15), 58G12 (allele 12), 192F7 (allele 10) and 250B6 (allele 7).

FIG. 4: A representative subset of mapped cosmids from our contiguousarray. Vertical lines above the solid line are the positions of EcoRIsites. Open triangles represent polymorphic STRS, filled trianglesrepresent STSS, filled squares represent single copy probes and opensquares represent transcribed sequences. The STRs which demonstratestrong linkage disequilibrium with Type I SMA are indicated by stars.Cosmids IG3 and IB9 are from the YAC 76CI cosmid library.

FIG. 5: Sequence duplication in the SMA region identified by p151.2.Hybridization of YACs with (A) the 700 bp fragment and (C) the 500 bpfragment. YACs are arranged from left to right, centromeric totelomeric. Hybridization of cosmids with (B) the 700 bp fragment and (D)the 500 bp fragment. (B) The 12 kb fragment is detected in the cosmidshowever the 20 kb fragment is not present. The 2.5 kb and 600 bpfragments detected in 3B3 and IEI respectively are end fragments.(D)Only the 3 kb fragment is detected in the cosmids. Note the absence ofthe 20 kb band in 24D6 in (A) but its presence in (C). The 700 bpfragment may be deleted in 24D6.

FIG. 6: Degree of linkage disequilibrium observed between Type I SMA andvarious polymorphic 5q13.1 markers giving a disequilibrium peak at 40G1.

FIG. 7: A PAC contiguous array containing the CATT region comprised ofnine clones and extending approximately 400 kb. The 2.2 kb transcriptreferred to as GA1 is shown.

FIG. 8: Structural organization of the SMA gene. The exons arerepresented by black boxes and numbered above. The positions ofrestriction sites are shown: B, BamHI; E, EcoRI; N, NotI, Exons 4 and 5(Scheme #1) or Exons 5 and 6 of Scheme #2 are frequently deleted in alltypes of SMA.

FIGS. 9A, 9B, 9C, and 9D is a single page alignment of the informationof FIGS. 6, 1, 7 and 8, respectively. FIG. 9(A) is a correlation of thedegree of linkage disequilibrium observed in type I SMA families betweenthe disease phenotype and six 5q13.1 markers with the physical map. TheSMA containing interval defined by the key recombinations described inthe text is shown. Note the proximity of the disequilibrium peak withthe centromeric end of the recombinant defined SMA interval.

FIG. 9(B) is a YAC contiguous array covering the SMA region of 5q13.1.For both YAC and PAC contigs, STSs are denoted by solid triangles,polymorphic tandem repeat polymorphisms by empty triangles, single copyclones by solid squares. Note that our physical map places the CMS sublocus containing allele 9, marked with an asterisk telomeric to theother CMS subloci, while the reverse was observed with geneticrecombination data, reflecting, we believe, the variation that exists inthis region of 5q13.1.

FIG. 9(C) is a PAC contiguous array covering the SMA region of 5q13.1.

FIG. 9(D) is the gene structure of NAIP as provided in more detail inFIG. 8.

FIG. 10: Exon content of PAC, fetal brain cDNA clones from non-SMAindividuals and RT-PCR clones from SMA affected individuals. E158 refersto the deletion of a glutamate residue. The RT-PCR products was onlyperformed between exons 13 and 4 (Scheme #2); additional, undetecteddeletions may exist outside of this region.

FIG. 11A, 11B, 11C and 11D: Structure of intact and internallydeleted/truncated versions of the NAIP gene as found in the indicatedPACs. In FIG. 11A, exons under Scheme #2 are marked as numbered blackboxes. N refers to NotI sites, B to BamHI and E to EcoRI sites. TheEcoRV clone that detects the 3 and 9.4 kb EcoRI bands referred to in thetext is denoted by EV in FIG. 11B. The 4.8 kb EcoRI/BamHI band deletedin FIG. 14 is also depicted. The 6 kb region containing exons 5 and 6(Scheme #2) and the 23 kb BamHI fragment resulting from this deletionare both shown in FIGS. 11C and 1D. The location of primers utilized toidentify deletions of exon 5 and 6 as well as those that identify thetruncated fragment in the deleted NAIP gene are shown above the NAIPstructure.

FIG. 12; Intron/exon splice sequences of the NAIP gene.

FIG. 13: Northern blot of adult tissues probed with exon 13 (Scheme #2)of the NAIP locus. Tissues are as marked and the filter were washed at50° C., 0.2× SSC and exposed for 4 days. Bands can seen in liver andplacenta in the 6-7 kb range.

FIG. 14: Pedigree and Southern blot analysis of consanguineousFrench-Canadian type III SMA families. Upper panel: probing of a filtercontaining BamHI/EcoRI digested genomic DNA with a cDNA probeencompassing exons 2 through 9 (Scheme #2) of NAIP reveals the loss ofthe 4.8 kb fragment that contains exons 5 and 6 (Scheme #2) in allaffected individuals resulting in an in-frame deletion. All others, savefor the homozygous normal sister and brother show half dosage for thisband. The lower panel shows a BamHI digest of the same family. Inaffected individuals two superimposed 14.5 kb contiguous fragments havesustained the 6 kb deletion of sequence containing a BamHI siteresulting in the generation of a 23 kb band (see FIG. 11). Note theexistence of the 23 kb BamHI band in all individuals in the pedigree inkeeping with its general dispersion in the population. Similarly, the9.6 kb BamHI band representing the deletion of exons 1 through 6 (Scheme#2) which is contained in PAC 238D12 and depicted in FIG. 11 can be seenin all individuals including non-SMA carriers.

FIG. 15: Results of PCR amplification in type 3 families 21470 and 24561using primers 1864 and 1863 which amplify exon 5 (Scheme #2). Thereactions were multiplexed with exon 13 (Scheme #2) primers 1258 and1343 to rule out PCR failure obscuring the results. Failure ofamplification in keeping with the homozygous absence of exon 5 (Scheme#2) can be seen to co-segregate with the disease phenotype.

FIG. 16: RT-PCR amplification of RNA from SMA and non-SMA tissues. Theletter n refers to RNA from non-SMA tissue and a to RNA from SMAaffected tissue. The tissue source is shown above each panel. Lym refersto lymphoblast and fib to fibroblast. All samples were from type 1 SMApatients with the exception of a5 which is from an affected member ofthe consanguineous type 3 SMA family 24561 shown in FIG. 15.

RNA was reverse transcribed from exon 13 (Scheme #2). Primary PCR ofproducts shown in panels A and B was with exon 1 primer 1884 and exon 13primers 1285 or 1974 and those in panel C with exon 6 primer 1919 andexon 13 primer 1285. Secondary PCR reactions for panel A used exon 4primer 1886 and exon 13 primer 1974; for panel B, exon 5 primer 1864 andexon 11 primer 1979 and for panel C, exon 9 primer 1844 and exon 13primer 1974.

Failure or amplification of reduced products can be seen in panel A forspinal cord and lymphoblast tissue for samples a2, a3, a4, a5, a6 anda7. Panel B also shows amplification of reduced size bands in a2 and a3,and in a7 a larger product in keeping with an insertion. Panel C showsreduced band size in keeping with deletions of exons 11 and 12 (Scheme#2) in a2, a3, a9 and a11.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Unless indicated otherwise, reference to exons in this detaileddescription of the invention will be based on exon numbering Scheme #2.

Throughout the specification, various letter abbreviations will be usedto identify various components or techniques. The following glossary isprovided to reference these items.

CTR--complex tandem repeat

DNA--deoxyribonucleic acid

PCR--polymerase chain reaction

PFGE--pulsed field gel electrophoresis

PAC--P1 artificial chromosome

RNA--ribonucleic acid

RT-PCR--reverse transcriptase-polymerase chain reaction

STR--simple tandem repeat

STS--sequence tag site

YAC--yeast artificial chromosome

This invention is directed to the identification, location and sequencecharacteristics of a gene which encodes Neuronal Apoptosis InhibitorProtein (NAIP). We have established that mutations in this gene arecausative of the previously discussed types I, II and III of SpinalMuscular Atrophies (SMA). It is believed that mutations in this generesult in the lack in the production of normal NAIP protein which isbelieved to be physiologically involved in the normal human process ofmaintaining neurological cells and preventing their early death commonto affected individuals. The subject gene maps to the SMA containingregion of chromosome 5q13.1. Unless indicated otherwise, reference toexons in this detailed description of the invention will be based onexon numbering Scheme #2. The gene comprises exons 1 through 17 ofapproximately 5.5 kb and has a restriction map for exons 2 through 11,as shown in FIG. 8. An updated restriction map for exons 2 through 16 isprovided in FIGS. 9D and 11A. As is appreciated, the gene isconsiderably longer than the sequence for exons 1 through 17.Considerable intron information exists between the exons which has notyet been sequenced. From the standpoint of diagnosing SMA, the sequenceinformation of exons 1 through 17 is very valuable. The normal sequenceis provided in Table 4, as well as being listed under SEQ ID NO: 1. Anygenetic mutation, that is, changes in the DNA sequence, whether they bedue to deletion, entire absence of gene substitution or polymorphismsand the like, are or can be causative of the disease. The most commonmutations are thought to be:

i) deletion of exons 5, 6 of the gene; or

ii) absence or marked reduction in the copy number of this gene in thechromosome 5 can be causative, if the remaining genes are defective.

Any form of biological assay may be employed to diagnose a person'ssusceptibility to SMA by virtue of conducting a biological assay todetermine the normal sequence or absence or presence of mutations in thenormal sequence. Such biological assays may include DNA hybridization byuse of DNA probes and the like, restriction enzyme analysis, PCRamplification of the relevant portions of the sequence, messenger RNAdetection and DNA sequencing of the relevant portions of the sequence,as isolated from chromosome 5 of the human biological sample. It isappreciated that a variety of the above generally identified biologicalassay procedures may be conducted where the preferred techniques are asfollows:

SMA diagnoses will be conducted in two ways. Initially, the genome ofthe human at risk will be assayed for the absence of NAIP exons 5 and 6.These exons are found to be absent with a frequency of 0.05% in thegeneral population and 50% in Type 1 SMA. The second approach will be toassess the number of copies of the NAIP gene in the individuals beingtested. We have observed that there is a general depletion of bothdeleted and intact forms of the NAIP gene, in individuals with SMA. Byusing a densitometric approach to assess the number of gene copies, anaccurate assessment of the risk having SMA can be established. The bestcorrelation is observed for exons 2 through 4 and exon 13.

In practical terms, the two steps outlined above will be conducted inthe following manner:

(i) two concurrent PCR reactions will be carried out upon the samealiquot of DNA (0.1 micrograms) from the human in question. One primerpair will map into exons 5 and 6 (e.g. primers 1863 SEQ ID NO: 7 and1864 SEQ ID NO: 8) and one pair will be homologous to a region outsideof exons 5 and 6 (primers 1343 SEQ ID NO: 5 and 1258 SEQ ID NO: 4). Thelatter reaction will be performed to ensure that the PCR is functioning.Two additional controls will be (i) PCR performed on genomic DNA knownto contain exons 5 and 6 employing the appropriate primers to ensurethat this particular reaction is working, (ii) negative controls usingwater as a template to ensure absence of contamination. All PCR productswill be placed in an agarose gel, separated electrophoretically andanalyzed visually.

(ii) Densitometric assessment of SMA risk will be carried out by usingPCR primers tagged with fluorescent dyes. PCR reactions employingprimers for exons 2 through 4, exons 13 as well as exons 5, 6 and exons11, 12 will be performed on genomic DNA from the individual beingassessed. PCR products will be separated electrophoretically on a geland the intensity of the individual bands assessed fluorometrically.These values will be correlated with normative values and SMA risk thusascertained.

It is apparent that one's level of NAIP correlates with the risk forother neurodegenerative disorders such as amyotrophic lateral sclerosisand Alzheimers. Consequently, the tests outlined above serve aspredictors of risk for these disorders as well. As is described in moredetail in the section under heading Baculoviral IAPs, the NAIP proteinhas significant homology with proteins for inhibiting cell apoptosis.Hence, any neurodegenerative disease which is based on neuronal cellapoptosis can now be predicted by use of the DNA sequence information ofthe NAIP gene. Such neuronal cell apoptosis is most likely linked tomutations in the NAIP gene similar to the mutations associated with SMAor other mutations in the gene which affect the biological activity ofthe NAIP protein inhibiting neuronal apoptosis.

As to mRNA detection we propose the following:

RT-PCR is a rapid technique for the analysis of RNA transcripts which isa crucial part of several molecular biology applications. This method ismuch more sensitive and efficient than traditional Northern blot, RNAdot/slot blots, and in situ hybridization assays. The sensitivity ofsuch a technique allows one to study RNA transcripts of low abundance orRNA isolated from small amounts of cells. In addition, an entire panelof transcripts can be analyzed simultaneously.

Protocol Summary: RNA is first isolated from tissues or cells and thenis used as a template for reverse transcription to complimentary DNA(cDNA). The reverse transcription (RNA-directed synthesis of DNA), iscatalyzed by the enzyme reverse transcriptase. The cDNA is then used asthe template for PCR using primers designed to amplify a selected cDNAregion. Following PCR, the product is analyzed by agarose gelelectrophoresis. The amplified cDNA is identified by the size of the PCRproduct which is predicted from knowledge of the cDNA nucleotidesequence. The PCR product can be further validated by restrictiondigestion, hybridization or nucleotide sequencing.

Enzymatic Amplification of RNA by PCR (RT-PCR)

This method is used to enzymatically amplify RNA using PCR.

Detailed Protocol: First the primer is annealed to the RNA. The RNA andcDNA primer are coprecipitated by adding together poly(A)⁺ RNA, cDNAprimer, and water. Sodium acetate is added and ethanol. This isprecipitated overnight over -20° C. The pellet is collected aftermicrocentrifugation. The pellet is washed with ethanol. Then water,Tis-HCl, and KCl are added and the mixture is heated to 90° C. and thencooled slowly to 67° C. Microcentrifuge and incubate 3 hours at 52° C.This final annealing temperature may be adjusted according to basecomposition of primer. Alternatively, the primer can be annealed to theRNA by mixing poly(A)⁺ RNA, cDNA primer, and water. This mixture isheated 3 to 15 minutes at 65° C. To the cooled mixture, add reversetranscriptase buffer.

The cDNA is now synthesized. Add reverse transcriptase buffer and AMVreverse transcriptase. This is mixed and incubated 1 hour at 42° C.(depending on the base composition of primer and RNA). Add Tris-Cl/EDTA,mix then buffered phenol and vortex. Microcentrifuge and add chloroformto the aqueous phase and vortex. Microcentrifuge. Add sodium acetate andethanol to aqueous phase. Mix and precipitate overnight at -20° C.Microcentrifuge, dry pellet, and resuspend in water.

The cDNA is then amplified by PCR. The mixture contains prepared cDNA,amplification, dNTP mix, amplification buffer, and water. Usually one ofthe amplification primers is the same as cDNA primer. If a differentamplification primer is used, the cDNA primer should be removed from thecDNA reaction. The reaction mixture is then heated 2 minutes at 94° C.,and microcentrifuged to collect condensate. Add Taq DNA polymerase, mix,centrifuge, overlay with mineral oil. Set up amplification cycles. Thenumber of cycles is varied depending upon the abundance of RNA. Fortycycles are usually sufficient. The products are then analyzed by gelelectrophoresis in agarose or nondenaturing polyacrylamide gels. ThecDNA can also be introduced directly into the amplification step.

In referencing the gene, its cDNA sequences, other DNA sequences and RNAsequences, it is understood that any specifically referenced sequenceincludes any and all biologically functional equivalence thereof.Similarly, with listed protein sequences, it is understood that suchterminology includes any and all biologically functional equivalencethereof insofar as the intended purpose is concerned. In the aboveidentified biological assays it is understood that the full length orpartial length sequences of the DNA or protein may be used. Generally itis contemplated that at least 18 sequential bases of the DNA sequenceare useful as hybridization probes, PCR primers and the like. Similarly,with protein sequences, at least 15 sequential amino acid sequences maybe correspondingly useful in developing protein receptors such asmonoclonal antibodies. Such monoclonal antibodies may be made inaccordance with the standard techniques by developing hybridomas forproducing monoclonals specific to certain antigenic determinants of theprotein structure.

With reference to Table 4, it would appear that in view of thesignificant homology of exons 5, 6, 7, 8, 9, 10 11 and 12 with the IAPdomains, such homology may well mean that any deletions or other formsof mutations in these exons may result in the carrier being susceptibleto the disease. For example, this is evidenced by the deletion of exons5 and 6 in low copy numbers in humans being causative of the disease.Hence, any of the sequence information in this region of the gene willbe important from a diagnosis standpoint so that any sequential 18 basesof DNA or 15 sequential amino acid residues in this region may be reliedon in the diagnosis of SMA in suspected humans. It is of course alsounderstood that other forms of deletions, mutations, polymorphisms andthe like in other regions of the gene may be causative of the disease ormay be used for other purposes in conjunction with disease analysis,prognosis and perhaps treatment.

Although the restriction maps are useful in identifying thecharacterizing features of the subject gene the specific cDNA sequenceof exons 1 through 17 has been provided in SEQ ID NO: 1. The encodingportion of the sequence commences at the ATG codon of base 396 of exon5. The encoding portion ends at the stop codon TAA of exon 16 at baseposition 4092. Exons 1 through 4 are at the 5' untranslated region andexon 17 is at the 3' unstranslated region. As with some genetic relateddiseases, mutations or polymorphism in the untranslated regions may aswell be causative of the disease so that sequence portions in the formof probes and the like in regions other than the region of significantIAP homology may be valuable in the diagnosis of SMA. It is alsounderstood that the sequence information of SEQ ID NO: 1 may be used inthe construction of suitable cloning vectors for purposes of producingmultiple copies of the gene or expression vectors for purposes oftransfecting a host to produce significant quantities by recombinanttechniques of the NAIP protein. Sections or fragments or full-lengthsequence information may be used in the construction of the cloningvectors or expression vectors depending upon the end use of suchvectors. With this understanding, the details in respect of theidentification of the SMA disease gene its characteristics, thecorresponding protein sequence and their uses in diagnosis areexplained.

A YAC contig of the Spinal Muscular Atrophy (SMA) disease gene regionalong chromosome 5q13 was produced which incorporated the D5S435-D5S12interval and encompassed 4 Megabases. The CATT-40G1 subloci on thecosmid array showed significant linkage disequilibrium with SpinalMuscular Atrophy indicating close proximity to the gene. However,delineation of the precise region containing the SMA gene was notpossible based on this information alone. A PAC contiguous arraycontaining the CATT region comprised of 9 clones and extendingapproximately 400 kb was constructed. The genetic analysis combined withthe physical mapping data indicated that the 154 kb PAC clone 125D9(FIG. 7) which contained the CMS allele 9 and the 40G1 CATT sublocus hada good probability for containing the SMA locus. Through furtheranalysis as will be described, PAC 125D9 was found to contain the geneencoding neuronal apoptosis inhibitor protein.

pYAC (yeast artificial chromosome plasmids) allow direct cloning intoyeast of contiguous stretches of DNA ≦400 kb. Circular pYAC plasmids(without inserts) can replicate in E. coli. In vitro digestion of pYAC,ligation to exogenous DNA, and direct transformation of the subsequentlinear molecules (with telomeric sequences at each termini) into yeastgenerates a library that can be screened by standard techniques.

Large YAC constructs are as stable as natural chromosomes. They are goodvectors for the construction of libraries from complex genomes such asthe human genome. In addition, sequences which are unclonable in E. colicosmid and lambda vectors are successfully cloned in YAC vectors.

YAC vectors are normally propagated in bacteria as circular plasmids.Restriction enzyme target sites are arranged to produce two arms upondigestion, each of which contains a different selectable marker andterminates at one end in a telomere, the other in a blunt end. Inaddition, one of the arms contains an ARS element. The two arms arepurified away from a linking fragment and ligated with donor DNAfragmented so as to leave blunt ends. The ligation mixture is used totransform yeast cells, and the selection conditions are such as torequire the presence of both arms, the insert interrupts a thirdselectable marker which allows non-recombinant structures to berecognized.

Construction of YAC Contig

YAC clones were isolated from three libraries, constructed at theNational Centers of Excellence(NCE, Toronto), the Imperial CancerResearch Fund (ICRF, London) (Larin et al., 1991) and the Centre d'Etudedu Polymorphisme Humaine (CEPH, Paris) (Albertson et al., 1990), all ofwhich were prepared from partial EcoRI digests of total DNA ligated intothe YAC vector pYAC4. ICRF YAC clones were identified by probing libraryfilters with 5q13.1 probes. YAC DNA from the NCE library was screened byPCR amplification, electrophoresed, immobilized onto Southern blots andhybridized with the radiolabelled STS product to identify positives.Numerous positives were obtained repeatedly in both the initial round ofPCR of pooled plates, and the second round with the plate(s) thought tocontain the clone of interest many of which proved to be falsepositives. The number of false positives obtained, which appeared to beprimer dependent, was reduced by radiolabelling PCR products andresolving these on 6% polyacylamide gels. The true positives could thenbe sized accurately without interference from spurious products.

Yeast strains with YACs positive for 5q13.1 STSs were grown on selectiveplates and examined for stability in the following manner: 4 colonies ofeach were grown for preparation in agarose blocks, yeast chromosomal DNAwas separated by pulsed field gel electrophoresis and transferred tofilters and the size and number of YAC clones contained within eachyeast colony was determined by hybridization with radiolabelled totalhuman genomic DNA. Positive clones were confirmed either byhybridization or PCR amplification with the original probe. Only YAC24D6-2 contained some colonies with more than one YAC.

YAC end clones and inter-Alu products were isolated by vector-Alu PCRand inter-Alu PCR respectively. The location of these products within5q11-13 was confirmed by hybridization to Southern filters of thesomatic cell hybrids HHW105 (Dana et al., 1982), containing the entirechromosome 5, and HHW1064 (Gilliam et al., 1989), a derivativecontaining chromosome 5 with a deletion at 5q11.2-13.3. Many of theseprobes demonstrated hybridization profiles indicative of locations bothwithin the 5q11-13 region and elsewhere on chromosome 5. In some casesprimers specific for the ends of each YAC were generated from thesequences of YAC end clones isolated by vector-Alu-PCR. The mapping ofeach new STS to 5q11-13 was determined by PCR amplification of DNA fromthe somatic cell hybrids HHW105 and HHW1064. In a few cases it was foundthat a primer pair contained a chromosome 5 repetitive sequence as thePCR amplified products from both HHW1064 and HHW105 were positive.Formulation of new STS primers resulted in the amplification of productsspecific to the 5q11-13 region. End clone hybridization and STS analysisperformed on all YACs confirmed the orientation and location of eachYAC.

The assembly of a contiguous array of YACs covering the SMA interval wasinitiated from two markers which flank SMA; D5S125 (Mankoo et al.,1991), which lies centromeric to D5S435 and the more telomeric markerD5S112 (Lien et al., 1991) (see FIG. 1). Six YACs were identified in theICRF library by the telomeric marker pJK53 (D5S112). One of these YACs,D06100, was shown to extend the furthest centromerically based on endclone STS analysis. The centromeric end of this YAC identified two YACsfrom the NCE library, 1281 and 1284. YACs positive for the D5S125 orD5S435 markers were not found in the ICRF or NCE library thus the CEPHlibrary was screened, from which clones containing D5S435 were isolated.A microsatellite polymorphism mapping into the center of the gap, CATT-1(Burghes et al., 1994), was utilized to detect three YACS, 24D6-2, 27H5and 33H10. These YACs were shown to be linked to both the centromericand the telomeric YACs (1281, 1284) by STS analysis. Internal YACproducts generated by AluPCR were utilized to probe all YACsestablishing the degree of overlap. STS sequences (Kleyn et al., 1993)mapping between JK348 and D5S112 were utilized to confirm the degree ofoverlap and the orientation of YACs in the contig. Concurrently theorder of each STS along 5q13 was confirmed. In all a total of 14 YACswere identified, anchored by the genetic markers D5S435, D5S629, CMS-1,CATT-1, D5F153, D5F149, D5F150, D5F151, D5S557 and D5S112.

Long Range Restriction Map and Estimation of Long range PhysicalDistance

A restriction map of the critical SMA region was constructed from theSTS Y116U (Kleyn et al., 1993), approximately 100 kb proximal to D5S629,to the STS Y107U (Kleyn et al., 1993), which lies approximately 500 kbdistal to D5S557 (see FIG. 2). In order to detect any possibility ofdeletions or rearrangements in our YACS, additional YACs isolated fromthe CEPH library (Kleyn et al., 1993), mapping within this region wereincluded in the analysis. YACs 24D62, 27H5, 33H10, 155H11, 76C1, 235B7,184H2, 428C5, and 81B11 (Kleyn et al., 1993) were partially digestedutilizing the rare cutter restriction endonucleases NotI, BssHII, SfiI,and RsrI. Southern blots of the Pulse Field Gel Electrophoresis (PFGE)separated restriction products were hybridized with YAC left arm andright arm specific probes which revealed the positions of cleavage sitesfrom both ends of each YAC. The orientation and overlap of the YACs hadbeen previously determined based on STS analysis, therefore the positionof the rare cutter sites among the overlapping YACs were compared. Byaligning the overlapping YACs at their common rare cutter sites, thedegree of overlap could be more precisely determined. The long rangerestriction map of the overlapping YACs derived from different sourceswas mostly in agreement with the exception of 33H10 and 428C5. 428C5 haspreviously been documented to contain a deletion (Kleyn et al., 1993),evident by comparison of its STS content and its size of only 300 kb,indicating that it lies further centromeric than its placement in FIG.2. YAC 33H10, based on STS analysis contains an internal deletion andYAC 155H11 is chimeric at its telomeric end therefore rare cutter sitesat the telomeric end of the map which could not be confirmed were notincluded. The results indicate the distance from the centromericboundary D5S435 to the telomeric boundary D5S557 to be 1.4 Mb in markedcontrast to 400 kb as previously reported (Francis et al., 1993) but inagreement with one other estimate (Wirth et al., 1993). Furthermore, theD5S629-D5S557 interval can be estimated at 1.1 Mb and the distance ofthe genetically defined CMS1-SMA-D5S557 interval is approximately 550kb.

Cosmid Contig Assembly from the Chromosome 5 Library

Although the isolation of cosmids utilizing whole YACs as probes couldbe an expeditious method of constructing a cosmid contig, in this casethe presence of chromosome 5 specific repeats would likely result in theisolation of cosmids mapping elsewhere on chromosome 5. A directedcosmid walking strategy was thus adopted. The CATT-1 STR, which has beenshown by irradiation hybrid analysis to map approximately midway betweenthe two flanking markers D5S435 and D5S351 (Hudson et al., 1992), wasutilized as the initiation point for the construction of a cosmid clonearray. The complex pattern of amplification seen on genomic DNA, withtwo to eight alleles per individual (see FIG. 3), suggested a variablenumber of copies or loci of the CATT-1 sequence in this region. ThirtyCATT-1 positive cosmids were identified which upon PCR analysis wereseen to contain one of four distinct alleles (see FIG. 3). As the cosmidlibrary was derived from a monochromosomal source, this confirmed thatthe CATT STR exists at least in four locations, which we refer to assubloci. These subloci are referred to as CATT-40G1, CATT-192F7,CATT-58G12 and CATT250B6-based on the cosmid addresses of the firstcosmids identified containing alleles of 12, 19, 15 and 20 cytosineadenosine (CA) dinucleotides respectively. Bi-directional walking wasinitiated from these 4 cosmid subloci. Positive hybridization wasobserved for cosmid 250B6 with one end of 58G12 and for 192F7 with theother end resulting in the ordering of cen-192F7-58G12-250B6-tel (FIG.4). All cosmids which contained the CATT-192F7 allele were mapped tothis location based on the size of their CATT-1 allele and theirrestriction enzyme profiles. As shown in FIG. 4 the CATT-192F7 sublocusis telomeric to the STR CMS-1, which itself lies telomeric to theCATT-40G1 sublocus.

Due to the presence of chromosome 5 specific repetitive sequences,resulting in the identification of cosmids from another region ofchromosome 5, the integrity of the contig was verified with each steptaken. Cosmid end clones generated by vector-Alu-PCR were hybridized tosomatic cell hybrid panels as described above. As repetitive sequenceswhich map solely to the region of chromosome 5 that is deleted in thehybrid cell line HHW1064 have been observed, cosmids identified by endproducts which did not hybridize to HHW1064 were analyzed further. Proofof overlap was shown by hybridization of end clones, single copy probehybridization, STS content, and restriction enzyme profile comparison.Cosmids identified by end clones which hybridized to HHW1064 wereeliminated and walking was continued by utilizing a different inter-Aluproduct from the clone of origin, which was verified in the same manner.Cosmid sizes were calculated by the addition of EcoRI restrictionfragments and the extent of overlap was determined by the addition ofthose fragments in common.

Cosmid Contig Assembly of YAC 76C1 Cosmids

As extension of the cosmid contiguous array was prevented by thepresence of chromosome 5 specific repeats, a 5× cosmid library wasproduced from YAC 76C1. The STSs CATT-1, CMS-1, Y122T (Kleyn et al.,1993), Y97T (Kleyn et al., 1993) and Y98T (Kleyn et al., 1993), whichare distributed along the YAC were utilized to identify cosmids toassemble the contig. As well, the previously developed markers, pZY8,pL7, pGA-1, p15.1, p402.1, p2281.8 and β-glucuronidase (Oshima et al.,1987) (Table 2, FIG. 4) from the established cosmid contig werehybridized to the library providing an effective method of ordering thecosmids. Cosmids demonstrating irregular hybridization patterns andthought to contain deletions and/or rearrangements were excluded.

The STS Y98T identified three cosmids including one previouslyidentified by the probe p2281.8, derived from a chromosome 5 libraryclone, 228C8, also containing the STS Y98T. An end product of thiscosmid hybridized to ten cosmids. Concurrently, an end fragment of aCATT40G1 sublocus was shown to hybridize to four of these ten cosmidsthus linking CATT-40G1 and CMS-1 with the more centromeric STS Y98T(FIG. 4). We were unable to identify any clones containing the YAC endSTS Y97T. Filter hybridization and STS mapping experiments indicated asecond more telomeric location of the CATT40G1 sublocus. A duplicationof this sublocus would agree with genotype data in our SMA kindreds(McLean et al., in press).

An EcoRI restriction map was generated utilizing a minimal set ofcosmids necessary to span the region. To ensure the reliability of thecontig, we sought to integrate it with the contig constructed from thechromosome 5 specific library. Concordance of the contigs was evident bycomparison of the restriction maps, the position of probes and STSs onthe map and Alu-PCR fingerprinting. In this manner the size of thecontig was estimated to be 210 kb. A directed walking strategy has thusresulted in the generation of a single contiguous set of cosmidscontaining the CATT-1 cluster of subloci with known centromere/telomereorientation.

Duplications/Deletions

Several lines of evidence suggested the presence of genomic sequenceduplications within our cosmid array. We provide evidence for theduplication of the CATT-40G1 sublocus in cosmids derived from a singlechromosome 5. A centromeric location for this sublocus established asthe CATT-40G1 sublocus was found to be contiguous with the STSs Y122T,Y88T and CMS-1 in several cosmids, and the centromeric YAC 428C5 ispositive for probes isolated from the CATT-40G1 containing cosmids.Although YAC 428C5 does not contain the CATT40G1 sublocus upon PCRamplification, this may be explained either by a null allele in thechromosome from which the YAC was derived or a deletion in the YAC. Wehave previously observed null alleles in individuals at distinct CATT-1subloci. A second more telomeric location of CATT-40G1 was determined bythe hybridization to CATT40G1 cosmids of the probes pGA-1, pL7, and pZY8all of which bind the more telomeric YACs 33H10, 24D62. Thehybridization of p402.1, derived from cosmid 40G1, to cosmids at bothlocations would indicate that the duplication is not restricted to theCATT-40G1 subloci and likely encompasses a larger region. Southern blotanalysis revealed distinct profiles of cosmids for the two locationshowever common bands were detected by Alu-PCR fingerprinting supportinga duplication.

Correlation of our YAC contig with the cosmid contig revealed that YACs76C1, 81B11, and 27H5 span the 150 kb CATT region of 5q13. Despite this,CATT-1 genotyping of these YACs revealed only one allele size, raisingthe possibility that the chromosomes from which these YACs were. derived(4 in all) contain null alleles at their remaining CATT-1 subloci. Ourexperience, however, with CATT linkage analysis of SMA familiesindicated that such a scenario is highly unlikely as none of theapproximately 300 individuals genotyped had fewer than 2 alleles. Weconsequently believe it is more likely that these CATT subloci areunstable and have been deleted during YAC construction and/orpropagation.

Sequence comparison between the CATT-1 and D5F153 primer sequencesindicated that these two STRs were similar and possibly the same as oneprimer is identical and the other primer sequences overlap by eightnucleotides. However, the centromeric YACs 428C5, 232F12, 235B7, 184H2,and the telomeric YACs 12H1, 155H11, 269A6 which were CATT-1 negativeyielded D5F153 amplification products indicating that CATT-1 may be aderivative of D5F153. These data, in combination with D5F153 analyses ofthe cosmid contig, which contains three D5F153 loci (FIG. 4), indicatedthat at least five D5F153 subloci exist.

In addition to the CATT-1 and D5F153 STRs, the STRs CMS-1 and D5F150were present in a variable number of copies per chromosome 5. STSanalysis localized CMS-1 to YACs 428C5, 76C1, 81B11 and 27H5 with allelesizes of 5, 4, 4 and 3, and 4 respectively. PCR amplification of genomicDNA revealed up to four alleles per individual indicating as many as twocopies per chromosome. D5F150 was present at two locations within thecosmid array yet only one location was detected in the YAC contig.D5F151 was not detected within our cosmid array nevertheless it wasplaced at the centromeric end of YAC 33H10, which encompasses the cosmidarray, based on the positive amplification of YAC 428C5. One location ofD5F149 was detected on both our cosmid and YAC clones. Our datasuggested, as with CATT-1, the existence of null alleles and/orinstability of the CMS-1, D5F150, D5F151, D5F149 sequences in YACS.

A deletion event was observed in hybridization with an 800 bp EcoRIfragment isolated as a single copy probe from the CATT-40G1 containingcosmid 234A1 from the chromosome 5 specific cosmid library. Probings ofYAC DNA failed to detect this fragment in any of our YACs. Hybridizationto genomic DNA of several individuals did not identify any deletionevents thus this sequence may be susceptible to instability in the YACS.Sequencing of this fragment did not reveal any exons or coding region.

Further evidence of sequence duplication in the SMA region wasidentified with a 1.2 kb internal Alu-PCR product (p151.2) from cosmid15F8 (FIG. 4). The probe identified three EcoRI fragments in YAC clones76C1, 81B11 and 27H5 (20 kb, 12 kb and 3 kb) but only one in 33H10and24D6 (20 kb) and one in 428C5 (12 kb). An internal EcoRI site dividedthis marker into 500 bp and 700 bp probes. The larger probe identifiedthe 12 kb and 20 kb fragments while the smaller probe identified the 3kb and 20 kb fragments (FIG. 5). We ruled out instability of thissequence in YACs as they are from different libraries and thehybridization patterns reflected their physical location. The 12 kb and3 kb fragments were localized on the EcoRI restriction map, however wewere unable to position the 20 kb fragment. Taken together thesefindings suggest the 12 kb and 3 kb lie in tandem with acentromeric/telomeric orientation respectively. A location of the 20 kbfragment distal to our contiguous array of cosmids may be inferred fromthe data. The duplication was confirmed by hybridization to genomic DNAdigests revealing all three fragment sizes.

YAC Contig and Cosmid Contig Characteristics

We established a YAC contig of the SMA disease gene region,incorporating the D5S435-D5S112 interval and encompassing 4 Mb.Orientation of the contig along 5q13 was confirmed by analysis of sevengenetic markers and STSs in combination with PFGE analysis. The longrange restriction map revealed neither major deletions norrearrangements among the YACs within our contig, and was utilized torefine the estimates of the size of the contig. Our YAC map establishesphysical linkage of the markers D5S629, D5F153, D5F151, D5F150, D5F149,CMS-1, CATT-1 and D5S557 to a 1.1 Mb region, a region of the genomecharacterized by low copy repetitive sequences and multilocus STRS.Furthermore, we estimated the new genetically defined CMS1-SMA-D5S557 tobe 550 kb. Estimates of the physical distance of the D5S435-D5S557interval ranging from 400 kb (Francis et al., 1993) to 1.4 Mb (Wirth etal., 1993) have been reported. In contrast to these studies ourestimation of 1.4 Mb for the D5S435-SMA-D5S557 interval and 550 kb forthe CMS11-SMA-D5S557 interval, employs clones derived from threesources, comprised of 6 chromosomes. Moreover, the determination of boththe size of clones and the position of rare cutter sites has enabled usto determine more precisely the extent of overlap of the YACs and thesize of the contig providing a reliable estimation.

We also assembled a single contiguous array of cosmid clones derivedfrom both a chromosome 5 specific library and a YAC (76C1) specificlibrary in conjunction with a restriction map of theCMS-1/CATT-1/D5F153/D5F150/D5F149 region encompassing 210 kb. Therepetitive sequences prevented extension of the cosmid contig whenutilizing a chromosome 5 specific library necessitating construction ofa cosmid library YAC 76C1 in the critical region. The contiguous cosmidarray was constructed by a directed walking strategy with validation ofcosmid overlap established by restriction fragment enzyme overlap, Alufingerprinting, and analyses involving STSs, cosmid end clones andsingle copy probes.

Physical and genetic mapping analyses revealed a complex region ofgenomic DNA comprising duplications and the presence of repetitivesequences. Genotyping of genomic DNA with complex STRs from this regionrevealed the presence of a polymorphic number of bands ranging as highas eight per individual. This suggested the presence of multiple copies,or subloci, for the STRs CATT-1, CMS-1, D5F153, D5F150. Our physicalmapping data confirmed the presence of these subloci except in the caseof D5F151 and D5F149 which revealed only one location. Four of theCATT-1 subloci map to our cosmid array within a 140 kb region; at leastone of these subloci, CATT-40G1, is duplicated. D5F153 and CATT-1 arerelated STRs which appear to have diverged from a common ancestor. Wehad localized one CMS-1 sublocus to our cosmid array, however, we wereunable to determine from our data whether other subloci exist on otherchromosomes within this 200 kb interval, as the chromosomes from whichthe YAC/cosmid libraries were derived may either contain null alleles atthe remaining subloci or have sustained deletions.

The CATT-1, D5F153, D5F150 and D5F149 STR, although present in multiplecopies on chromosomes in the population were observed as single sublocusmarkers on all YACS, as evidenced by single allele PCR products foreach, suggesting instability and deletion of these sequences. This issupported by the absence in our YACs of an 800 bp fragment, derived fromthe chromosome 5 cosmid library based contiguous array. Instability ofthese sequences does not appear to result in large deletions asadditional unique sequence probes located between the multiple sublociare retained in the YACs.

In summary, we have produced the first high resolution physical map ofthe critical SMA region. However, delineation of the precise regionwhich contained the SMA gene was not possible based on this informationalone.

Concurrent with our genetic analysis, we constructed a YAC contiguousarray employing clones from three different YAC libraries (Roy et al.,1994). A minimal representation from this array, which was correlatedwith extensive pulsed field gel electrophoresis (PFGE) analysis, isshown in FIG. 9B.

With the initial suggestion of linkage disequilibrium of the generalCATT marker and SMA (Burghes et al., 1994), the construction of a cosmidcontiguous array incorporating the extended CATT region was undertaken.The presence of extensive and polymorphic genomic repetitive elementsmapping both to 5q13 and elsewhere on chromosome 5 interfered with astraightforward assembly of a contiguous array. However, the integrityof the array was established by restriction enzyme analyses, Alu-PCRfingerprinting, STS content determination and nucleic acid hybridizationusing cosmid end clones and other single copy probes. This resulted inthe generation of an array encompassing 220 kb that contained the fiveCATT subloci contained in a monochromosomally derived flow sortedchromosome 5 genomic library (Roy et al., 1994). More recently, a P1artificial chromosome (PAC, Ioannou et al., 1994) contiguous arraycontaining the CATT region, comprised of 10 clones and extendingapproximately 550 kb, was constructed (FIG. 9C).

Linkage Disequilibrium Analysis

A linkage disequilibrium analysis employing 5 complex and simple tandemrepeats mapping to the SMA region was conducted. Two of thepolymorphisms employed in this analysis were the CATT-40G1 andCATT-192F7 subloci which we mapped to our cosmid array. Specificamplification of the two individual subloci was achieved by constructingprimers ending on sequence polymorphisms in the region flanking the CArepeat. A clear linkage disequilibrium peak was observed at theCATT-40G1 sublocus as shown in FIG. 6.

PAC Contig Array

Since the 40G1 CATT subloci demonstrated linkage disequilibrium, a PACcontiguous array containing the CATT region was constructed. This PACcontig array comprised 9 clones and extended approximately 400 kb (FIG.7). Our genetic analysis combined with the physical mapping dataindicated that the 40G1 CATT subloci marker which showed the greatestdisequilibrium with SMA was duplicated and was localized at the extremecentromeric of the critical SMA interval. Consequently the 154 kb PACclone 125D9 which contained within 10 kb of its centromeric end the SMAinterval defining CMS allele 9 and extended telomerically to incorporatethe 40G1 CATT sublocus was chosen for further examination.

Two genomic libraries were constructed by performing complete andpartial (average insert size 5 kb) Sau3A1 on PAC 125D9 and cloning therestricted products into BamH1 digested Bluescript plasmids. Genomicsequencing was conducted on both termini of 200 clones from the 5 kbinsert partial Sau3A1 library in the manner of (Chen et al., 1993)permitting the construction of contiguous and overlapping genomic clonescovering most of the PAC. This proved instrumental in the elucidation ofthe neuronal apoptosis inhibitor protein gene structure.

PAC 125D9 is cleaved into 30 kb centromeric and 125 kb telomericfragments by a NotI site (which was later shown to bisect exon 7 of thePAC 125D9 at the beginning of the apoptosis inhibitor domain. The NotIPAC fragments were isolated by preparative PFGE and used separately toprobe fetal brain cDNA libraries. Physical mapping and sequencing of theNotI site region was also undertaken to assay for the presence of a CpGisland, an approach which rapidly detected coding sequences. The PAC125D9 was also used as a template in an exon trapping system resultingin the identification of the exons contained in the neuronal apoptosisinhibitor protein gene.

The multipronged approach, in addition to the presence of transcriptsidentified previously by hybridization by clones from the cosmid array(such as, GA1 and L7), resulted in the rapid identification of six cDNAclones contained in neuronal apoptosis inhibitor protein gene. Theclones were arranged, where possible, into overlapping arrays. Chimerismwas excluded on a number of occasions by detection of co-linearity ofthe cDNA clone termini with sequences from clones derived from the PAC125D9 partial Sau3A1 genomic library.

Cloning of Neuronal Apoptosis Inhibitor Protein Gene

In the meantime, a human fetal spinal cord cDNA library was probed withthe entire genomic DNA insert of cosmid 250B6 containing one of the 5CATT subloci. This resulted in a detection of a 2.2 kb transcriptreferred to as GA1 which location is shown in FIG. 7. Further probingsof fetal brain libraries with the contiguous cosmid inserts (cosmids40G1) as well as single copy subclones isolated from such cosmids wereundertaken. A number of transcripts were obtained including one termedL7. No coding region was detected for L7 probably due to the fact that asubstantial portion of the clone contained unprocessed heteronuclearRNA. However, we later discovered that L7 proved to comprise part ofwhat is believed to be the neuronal apoptosis inhibitor protein gene.Similarly, the GA1 transcript ultimately proved to be exon 13 of theneuronal apoptosis inhibitor protein. Since GA1 was found to containexons indicating that it was an expressed gene, it was of particularinterest. The GA1 transcript which was contained within the PAC clone125D9 was subsequently extended by further probing in cDNA libraries.

The extended GA1 transcript was compared to other known sequences toreveal that its amino acid sequence had significant homology to theinhibitor apoptosis polypeptides of Orgyia Pseudotsugata and CydiaPomonella viruses (Table 3). This sequence analysis revealed thepresence of inhibitor apoptosis protein homology in exons 5 and 6.

The remaining gaps in the cDNA were completed and the final 3' extensionwas achieved by probing a fetal brain library with two trapped exons. Aphysical map of the cDNA with overlapping clones was prepared. Theentire cDNA sequence (SEQ ID NO: 1) is shown in Table 4 and containssixteen exons. The amino acid sequence (SEQ ID NO: 2) starts withmethionine which corresponds to the nucleotide triplet ATG. FIG. 8demonstrates the structural organization of the SMA gene.

The cDNA sequence of NAIP (SEQ ID NO: 1) shown in Table 4 allows oneskilled in the art to develop from this gene, primers, probes and alsoantibodies against the protein product. The cDNA sequence (SEQ ID NO: 1)of Table 4 may be used in recombinant DNA technology to express thesequence in an appropriate host in order to produce the neuronalapoptosis inhibitor protein. In this manner, a source of neuronalapoptosis inhibitor protein is provided. Given the sequence of NAIP andthe probes and primers therein, deletions in the sequence may also bedetected, for instance, in the disorder Spinal Muscular Atrophy.

NAIP Structure

The NAIP gene contains 17 exons comprising at least 5.5 kb and spans anestimated 80 kb of genomic DNA. The NAIP coding region spans 3698nucleotides resulting in a predicted gene product of 1233 amino acids(SEQ ID NO: 2). NAIP contains two potential transmembrane regions and anintracellular inhibitor of apoptosis domain immediately contiguous witha GTP binding site. Searches of the protein domain programs generatedthe following results:

(i) residues 9-91: an N terminal domain with no recognizable motifs.

(ii) residues 94-118: hydrophobic potential membrane spanning domain.

(iii) residues 169-485: a domain which shows homology with apoptosisinhibitors and is immediately before the next hydrophobic domain,GTP/ATP binding site.

(iv) residues 486-504: a hydrophobic potential membrane spanning domain.

(v) residues 505-1005: possible receptor domain containing 4 N-linkedglycosylation sites and a lipoprotein binding domain

Neuronal Apoptosis Inhibitor Protein Gene Mutational Analysis

A cDNA20.3 probe was found by using the entire PAC 125D9 as a probe toscreen cDNA libraries. Probing of genomic southerns with cDNA probe 20.3revealed the absence of a 9 kb EcoRI band in a Type III consanguineousfamily. This information mapped the NAIP gene deletions to exons 5 and6. Thus the deletion covers the exon containing the rare NotIrestriction site and the exon immediately downstream. Primers in andaround these exons were constructed revealing the absence ofamplification from 3 Type I and 3 Type III SMA individuals. Genomic DNAwas isolated from PAC and cosmid subclones in and around exons 4 and 5and sequenced in an effort to generate primers which would amplify thejunction fragment generated by the causative deletions as depicted. Ajunction fragment was detected in the Type III individual. A similarproduct was observed in two other French Canadians with no history ofconsanguinity. The 3 Type I and 3 Type III SMA individual's chromosomeshad identical CATT/CMS haplotypes strongly suggesting that this is acommon mild SMA mutation and comparatively frequent in the FrenchCanadian population. Cosegregation of this pattern was demonstrated. Wehave conducted analysis of 110 parents of SMA individuals and havefailed to find a similar product. Sequencing of the genomic DNA in thisregion revealed an approximately 10 kb deletion resulting in an in framedeletion. This deletion spans intron regions and exons 5 and 6. Southernblot analysis of two generation SMA families was performed. A cDNA probeencompassing the first eight exons was performed on EcoRI-digested DNAfrom peripheral blood leukocytes. SMA affected members show an absenceof hybridization to a 10 kb EcoRI band which was shown to contain exons5 and 6 (FIG. 9).

Initial isolation of the NAIP transcript was achieved by probing a humanfetal brain cDNA library with the entire 28 kb genomic DNA insert ofcosmid 250B6 that contains one of five CATT subloci present in thecosmid library. This resulted in the detection of a 2.2 kb transcriptthat ultimately proved to be exon 14 of the NAIP gene. Further probingof fetal brain libraries with the contiguous cosmid inserts (cosmid40G1), as well as single copy subclones isolated from such cosmidsidentified a number of transcripts including the L7 transcript thatultimately proved to contain exon 13 of the NAIP locus. No coding regionwas detected for L7, probably due to the fact that a substantialproportion of the clone contained unprocessed heteronuclear RNA,obscuring its true nature.

At this stage, the completed genetic and linkage disequilibrium analysesand construction of the PAC contiguous array identified PAC 125D9 ashaving a good probability of containing the SMA locus. Four PAC 125D9genomic libraries were constructed by performing complete and partial(average insert size 5 kb) Sau3AI, BamHI and BamHI/NotI digests on thePAC insert and cloning the restricted products into plasmid vector. Highthrough put genomic sequencing was conducted on both termini of 200clones from the 5 kb insert partial Sau3AI digestion library in themanner of (Chen et al., 1993), permitting the construction of contiguousand overlapping genomic clones covering most of PAC 125D9 (data notshown). This has proven instrumental in elucidating the gene structureof the NAIP locus.

PAC 125D9 is divided into 24 kb centromeric and 130 kb telomericfragments by NotI digestion, bisecting exon 6 of the NAIP gene at thebeginning of the first potential transmembrane domain mapping upstreamof the inhibitor of apoptosis homologous domains (FIG. 11 and Table 4).The NotI PAC fragments were isolated by preparative PFGE and usedseparately to probe human fetal brain cDNA libraries. Physical mappingand sequencing of the NotI site region was also undertaken to assay forthe presence of a CpG island, an approach that rapidly detected codingsequence. The PAC was also used as a template in an exon trapping system(Church et al., 1994) resulting in the identification of the NAIP geneexons 5, 12, 16 and 17.

This multi-pronged approach resulted in the identification of cDNAclones spanning the NAIP gene (FIG. 10). Overlapping clones wereidentified and chimerism of cDNA clones was excluded on a number ofoccasions by the detection of co-linearity of the cDNA clone terminiwith sequence from clones of the PAC 125D9 partial Sau3AI digestiongenomic library. At this time, sequence analysis revealed the similaritybetween the protein sequence encoded by the NAIP gene exons 7 through 13with two baculoviral inhibitor of apoptosis proteins (IAPs). Shortlythereafter, probing of Southern blots containing DNA from consanguineousSMA families with cDNA probes revealed deleted bands.

Both IAPs contain in their amino terminus an 80 amino acid BIR(baculovirus IAP repeat) motif that, after an intervening sequence ofapproximately 30 residues, is duplicated with 33% identity (Clem andMiller, 1993). The same phenomenon is observed in NAIP; amino acids185-250 encoded by exons 6, 7 and 8 are 35% homologous to amino acids300-370 encoded in exon 10, 11 and 12. The greatest stretch of homologyis observed over a 53 amino acid region with 29 identical amino acids.

In addition to the NH2 terminal IAP domain, there exists cysteine andhistidine rich zinc finger-like motifs in the carboxy terminus of bothCpIAP and OpIAP. These motifs, which are proposed to interact with DNA(Birnbaum et al., 1994), are not seen in NAIP (Table 4). NAIP containstwo potential transmembrane regions that bracket an inhibitor ofapoptosis domain and a contiguous GTP binding site. Additional searchesof protein domain programs generated the following more specific resultsthan the aforementioned protein domain evaluation.

1. Residues 1-91: an N terminal domain with no recognizable motifs;

2. Residues 92-110: a hydrophobic domain predicted by the MEMSAT program(Jones et al., 1994) to be a membrane spanning domain;

3. Residues 163-477: a domain that shows homology with baculoviralinhibitors of apoptosis proteins followed by, and immediately upstreamof the next hydrophobic domain, a GTP/ATP binding site;

4. Residues 479-496: hydrophobic domain predicted by MEMSAT to be amembrane spanning domain;

5. Residues 497-1232: a possible receptor domain containing fourN-linked glycosylation sites and a procaryotic lipid attachment site.

We know of at least three exons that comprise 400 bp of 5' untranslatedregion (5'UTR); it is possible that more exist. A striking feature ofthis region is the presence of a perfect duplication of a 90 bp regionin the 5' UTR before exon 2 and in the region bridging exons 2 and 3(Table 4). In addition, the 3' untranslated region comprising exon 17has been found to contain a 550 bp interval that has potential codingregion detected by the GRAIL program with high homology (P=1.1e-37) tothe chicken integral membrane protein, occluding (Furuse et al., 1993).There exists, the possibility that this represents a chimerictranscript. Occluding homologous sequence has been detected in fourdifferent cDNA clones and two isoforms of the gene. The possibility ofthe occluding sequence representing a coding exon of the NAIP gene withthe putative 3' UTR actually being heteronuclear RNA is also unlikelygiven the consistency with which the 3' UTR is observed and the presenceof in frame translational stop codons mapping upstream of the region ofoccluding homology. Preliminary RT-PCR analysis indicates that theoccluding tract is transcribed.

Tissue Expression

Hybridization of a Northern blot containing adult tissue mRNA with anexon 14 probe detected bands only in adult liver (approximately 6 and 7kb bands) and placenta (7 kb, FIG. 6). Although the level of expressionin adult CNS is not sufficient to result in visible bands on Northernanalysis, successful reverse transcriptase-PCR (RT-PCR) amplification ofthe NAIP transcript using spinal cord, fibroblast and lymphoblast RNAsuggests transcriptional activity in these tissues.

Detection of Truncated and Internally Deleted Versions of the NAIP gene

In the analysis of the PAC contig, the clones 238D12 and 30B2 were notedto show significant sequence similarity with 125D9 but not to containthe NotI site in PAC 125D9 that is located in NAIP exon 6. Thisindicated the possibility of duplicated copies of the NAIP gene and sofurther analysis by hybridization of Southern blots containing PAC DNAwith NAIP exon probes and PCR STS content assessment was undertaken. Inthis manner, two aberrant versions of the NAIP locus were detected, onewith exons 2 to 7 deleted (PAC 238D12), and another with exons 6, 7 and12 to 15 deleted (PACs 30B2 and 250I7). The presence of identical sizedbands in both genomic and PAC DNA on Southern blot analysis as well asPCR results outlined below obviate the possibility that the deletionsrepresent in vitro PAC artifacts rather than the in vivo situation.Thus, genomic DNA Southern blots hybridized with NAIP exon probesrevealed more bands than would be expected with a single intact copy ofthe NAIP gene. For example, probing of blots containing BamHI restrictedgenomic DNA with NAIP exons 3-11 should lead to a single band comprisedof equal sized contiguous 14.5 kb BamHI fragments in the intact NAIPlocus (FIG. 11). Instead, two additional bands are seen at 9.4 and 23 kb(FIG. 14), fragments that are seen in PACs 238D12 and 30B2/250I7respectively. The 9.4 fragment BamHI has been subcloned from a cosmidand found to contain exons 8-11 with a deletion incorporating exons 2 to7 occurring just upstream of the 8th exon (FIG. 11). The 23 kb band isgenerated by a 6 kb deletion removing a BamHI site leading to thereplacement of the two contiguous 14.5 kb BamHI fragments with a 23BamHI fragment containing exons 2 to 5 and 8 to 11 and lacking exons 5and 6 as depicted in FIG. 11. The left side of this deletion was mappedby the fact that amplification with primers 1933 and 1926 generated aproduct whereas PCR with 1933 and 1923 did not (data not shown). PCRemploying primers 1927 and 1933, constructed to amplify a 4.2 kbjunction fragment spanning the 6 kb deletion (FIG. 11), generated theappropriate product as shown by size and sequencing in both genomic DNAand PACs 30B2/250I7. The variable dosage of both the 9.4 and 23 kb bandsseen in genomic DNA from different individuals indicates that the twopartially deleted versions of the NAIP gene are present in multiple andpolymorphic number in the general population.

A further level of complexity was detected with the identification ofclones from a non-SMA human fetal brain cDNA library deleted for exons11 and 12 (Scheme #1), some of which also had exons 15 and 16 (Scheme#1) absent (FIG. 10). The fact that these deletions result in frameshifts and premature protein truncation indicates that they are, ratherthan normal splicing variants, more likely the result of transcriptionof the deleted and truncated version of NAIP gene that are present inthe general population (FIG. 11). In all, a profile of a regioncontaining a variable number of copies of internally deleted andtruncated versions of the NAIP locus, some of which are transcribed, hasemerged from our analysis.

Probings of blots containing DNA from the somatic cell hybrid HHW 1064(Gilliam et al., 1989) with NAIP exonic probes indicates that all formsof the NAIP gene are confined to the 30 Mb deleted region of 5q11-13.3contained in the derivative chromosome 5 of this cell line. This findinghas been confirmed by FISH probings with NAIP exon 13 probe (unpublisheddata).

NMAP Gene Mutational Analysis

Probing of genomic Southern blots with PCR amplified NAIP exons 3 to 10revealed the absence of a 4.8 kb EcoR1/BamHI fragment containing exons 5and 6 in the four affected individuals of consanguineous Type III SMAfamily 24561 (FIGS. 11 and 14). The same probing of BamHI digested DNAfrom this family revealed the absence of a 14.5 kb band also in keepingwith a loss of exons 5 and 6 as outlined above (FIGS. 11 and 14).Similar results were observed in two other French Canadian SMA familiesthat were also believed consanguineous.

In order to confirm the proposed deletion of exons 5 and 6, primershomologous to these exons were made (primers 1893, 1864, 1863, 1910 and1887 identified by arrow in FIG. 11. Results of a representative PCRamplification of DNA from the family 24561 and a second Type III SMAconsanguineous family using exon 5 specific primers (primer 1864 and1863) along with a simultaneous reaction of an exon 13 sequence includedto rule out a failure of the PCR are shown in FIG. 15. Absence ofamplification of exon 5 can be seen to cosegregate with the SMAphenotype.

In order to determine if the exon 5 and 6 NAIP gene deletion was an SMAmutation, Southern blot analysis was conducted. An 800 bp EcoRV singlecopy probe that mapped immediately to the 3' side of the 6 kb exon 5 and6 deletion was employed (FIG. 11). Hybridization of this marker to.EcoRI Southern blots detected both a 9.4 kb EcoRI fragment containingexons 5 and 6 from the intact NAIP locus as well as a 3 kb EcoRI bandfrom the exon 5 and 6 deleted copy of the NAIP gene. Analysis wasconducted on EcoRI Southern blots containing DNA from over 900 unrelatedmembers of myotonic dystrophy, ADPKD and cystic fibrosis familiesobtained from our DNA diagnostic laboratory. The 9.4 kb band was seen inall individuals in keeping with the presence of at least one copy ofexons 5 and 6 in each of the approximately 900 individuals tested. Inaddition, the 3 kb band was observed in every individual reflecting avirtually complete dispersion of some form of the exon 5 through 6deleted NAIP gene in the general population. Moreover, the variable banddosage observed for the 3 kb band suggested that the number of copies ofthe exon 5-6 deleted NAIP gene is polymorphic possibly ranging as highas 4 or 5 copies per genome.

PCR analysis was then extended to 110 SMA families, employing exon 5 and6 primers. Seventeen of 38 (45%) Type I SMA individuals and 13 of 72(18%) Type II and III SMA individuals were homozygously deleted forthese exons. Assuming random assortment of chromosomes and thereforetaking the square of the observed frequency of homozygous exon 5 through6 deleted individuals yields estimated frequencies for exon 5 through 6deleted chromosomes of 67% in Type I SMA and 42% in Type II/III SMA. PCRanalysis was next conducted on 168 parents of SMA children revealedfailure of amplification suggesting homozygous deletion of exon 5 and 6in three individuals. This finding was confirmed by Southern analysis inthe two cases with sufficient DNA for this assay. The two individuals,aged 28 and 35 and both parents of Type I SMA children, when interviewedby telephone described themselves to be physically well, reporting nosymptoms suggestive of SMA. It was thus concluded that the deletion ofNAIPs exons 5 through 6 in isolation, while possibly reflecting moresevere deletions in individuals with SMA as outlined below, can beclinically innocuous associated either with an exceedingly mild SMA oreven normal phenotype. Clinical assessment of these individuals iscurrently being undertaken.

Judging both by the cDNA clones detected from fetal brain libraries aswell as the make-up of RT-PCR NAIP products (FIG. 2), many and possiblyall truncated copies of the NAIP gene appear to be transcribed. Giventhe apparently unaffected status of the three parents of individualswith SMA who do not have a copy of exons 4 and 5 in their genome webelieve that the exon 5 through 6 deleted version of NAIP is alsotranslated. In keeping with this model, removal of exons 5 and 6 resultsin an in-frame deletion that extends the longest NAIP open reading frameupstream to a start methionine in exon 3 at nucleotide 211 (Table 4).

Furthermore, the protein sequence encoded by the deleted exon 5 and 6IAP motif is approximately 35% homologous to the IAP motif encoded inexons 10 and 11 possibly accounting for the absence of discerniblephenotype in the three exon 5 through 6 deleted individuals. Onepossible model is that a single copy of exon 5 through 6 deleted NAIP oneach chromosome results in the mild SMA phenotype, while individualswith greater than 3 or 4 copies of the exon 4-5 deleted NAIP locus areclinically unaffected. The possibility that duplication of the SMA geneunderlies the disease has recently been proposed by DiDonato et al.(1994).

RT-PCR amplification of RNA from SMA and non-SMA tissue. The results ofRT-PCR amplification using RNA from both non-SMA and SMA individuals astemplate are shown in FIG. 16.

We have established that at least some of the internally deleted andtruncated NAIP versions are transcribed. In order to distinguish betweentranscripts from the intact NAIP gene which would produce a functionalprotein from those that would not, an effort was made to RT-PCR amplifytranscripts that were as large as possible. Given the 2.2 kb size ofexon 14, this was found to be one which encompassed exon 2 and the 5'end of exon 13. No product was detected at the level of ethidium bromidestaining after first round PCR. Therefore, second round nestedamplification was undertaken as described in respect of the previousdescription of FIG. 16.

A representative subset of RT-PCR experiments are shown in FIG. 16. PCRof reverse transcribed product using RNA from non-SMA tissues astemplate and reverse transcribing from exons 10 or 13 consistentlyamplified product of the expected size. In contrast, similar RT-PCRexperiments on RNA from SMA tissue revealed no amplification in fivecases in keeping with the marked down regulation or complete absence ofthe intact transcript in such individuals (FIG. 16A). The RNA obtainedfrom the SMA tissues was no more than 12 hours post-mortem. As we haveno difficulty in amplifying intact NAIP transcript from normal tissuewhich is 24 hr post mortem, we do not believe the difficulty inamplification arises from RNA degradation. Furthermore, difficulty withamplification was seen for all SMA tissues which suggests against thepossibility that NAIP is transcribed solely in the motor neuron withdepletion of this cell type in SMA resulting in RT-PCR failure in spinalcord tissue.

In the cases where amplification was observed, sequencing of RT-PCRproducts has revealed the following findings, as shown in FIGS. 16A, 16Band 16C:

(i) an in-frame deletion of codons 153 and 190 from the 3' end of exon 5from sample a9.

(ii) deletion of exon 6 resulting in a frame shift with a stop codonoccurring 73 nucleotides into exon 7 in a product amplified by exon 5primer 1864 and exon 13 primer 1974 from sample a2.

(iii) an approximate 50 nucleotide insertion in a product amplified byexon 4 primer 1886 and exon 13 primer 1974 from sample a7.

(iv) deletion of a glutamic acid codon number 158 in exon 5 inassociation with deletion of exon 11 and 12 in a product amplified byexon 5 primer 1864 and exon 13 primer 1974 from sample a3.

(v) deletion of exons 11 and 12 introducing a frame shift and a stopcodon 14 nucleotides into exon 13 in a product amplified by exon primer9 primer 1844 and exon 13 primer 1974 in sample a2, a3, a9 and all.

In all, employing PCR on material reverse transcribed from exon 13, wehave observed successful amplification of the appropriate product fromall 12 non-SMA tissues attempted and in only one of 12 SMA tissues. Inthe latter case, sample a12, amplification was from exons 13 to 4 only,whether the transcript also incorporates exons 2 to 3 or 14 to 17 isunknown. We believe that these data provide strong evidence for NAIPbeing the SMA gene.

Role of NAIP Protein

The discovery of a neuronal apoptosis inhibitor protein gene in the SMAregion of chromosome 5 demonstrates that the SMA condition is a resultof deletions in the apoptosis inhibitor protein domains. The long timesurvival of motorneurons is dependent on the production of completeneuronal apoptosis inhibitor protein. The deletion of the apoptosisinhibitor protein domain compromises the protein activity. We havedemonstrated that approximately 70% of all SMA affected individuals havedeletions of exons 5 and 6 of chromosome 5.

The identified region of 5q13.1 contains a variable number of copies ofintact and partially deleted forms of the NAIP gene. While we cannotrule out the presence of additional loci in 5q13.1 that when mutatedcontribute to the SMA phenotype, we believe that mutations of NAIP geneare necessary and possibly sufficient for the genesis of SMA. Incontrast to most autosomal recessive diseases where causal mutations areusually detected in the single copy of a given gene, we propose that anSMA chromosome is characterized by a paucity or, for severe SMAmutations, an absence of both the intact NAIP gene as well as thatversion which has had exons 3 and 4 deleted. The genesis of suchchromosomes may involve unequal crossovers leaving the chromosomedepleted for these loci with the resulting absence of the NAIP geneproduct leading to SMA.

Diagnosis of SMA

The delineation of an SMA genotype in a given individual is complicatedby the unusual amplification of the NAIP gene in the 5q13.1 region.Probings of Southern blots containing genomic DNA with NAIP exon probesinvariably reveal bands resulting from copies of internally deleted andtruncated versions of the NAIP gene. The presence of variable numbers ofthe different forms of the NAIP loci in the general population istherefore the norm and not diagnostic of an SMA mutation per se,complicating the mutational analysis of the NAIP gene. If the detectionof genomic DNA containing altered NAIP loci affords no proof of an SMAchromosome then, by default, the search must be for the absence of thenormal NAIP gene. However, we have detected rare individuals with nocopies of exons 3-4 in their genome who are clinically unaffected, anobservation that is in keeping with what we know of NAIP gene structure.Consequently, the identification of an SMA chromosome is contingent onthe absence of both the intact as well as the exons 3-4 only deletedforms of NAIP. Assaying for their absence is complicated by the presenceof segments of normal NAIP gene in each of the other, more extensivelydeleted, forms of the NAIP locus. One can see, for example, that if agiven SMA individual had in their genome only the deleted versions ofNAIP found on PACs 238D12 and 30B2, that is exons 1-6 deleted and exons5,6 and 11-14 deleted, respectively (FIGS. 10 and 11) in their genome,they would appear by PCR and Southern analysis to have the exons 5-6only deleted version of NAIP and therefore to have non-SMA chromosomes.We believe that many and perhaps most of the numerous exon 5-6 deletedSMA individuals we have observed actually have chromosomes with such aconfiguration, containing neither the intact NAIP loci nor the exons 5-6only deleted version but rather, some other combination of more severelytruncated/deleted versions of the locus with resultant absence of intactNAIP translation. Support for this interpretation comes from ourinability to amplify normal NAIP transcripts employing RT-PCR on RNAfrom Type I SMA tissue.

In all, the evidence in support of mutations in or the absence of theNAIP gene causing SMA includes the following:

(i) The strong possibility that the NAIP, given its homology withbaculoviral IAPs, functions as an inhibitor of apoptosis. Thischaracteristic is wholly compatible with the pathology of SMA. It isnoteworthy that mutations in a regulator of apoptosis have beenpreviously suggested as a speculative cause of SMA (Oppenheim 1991,Sarnat, 1992).

(ii) The mapping of the NAIP locus within the recombination definedcritical SMA interval and the fact that the three polymorphic markersthat have been shown to be in strong linkage disequilibrium with type ISMA; CATT-40GI (McLean et al., 1994), C272 (Melki et al., 1994) and AG-1(DiDonato et al., 1994) all map to PAC 125D9 and are present on NAIPintrons (FIG. 9C).

(iii) The nature of linkage disequilibrium observed between the type 1SMA phenotype and the 5q13.1 markers. We have shown that the CATT-40G1CTR sublocus which is frequently duplicated on non-SMA chromosomes (Royet al., 1994), is deleted in 80% of type 1 SMA chromosomes compared with45% of non-SMA chromosomes (McLean et al., 1994). This finding is inkeeping with a depletion of the number of NAIP genes on SMA chromosomes.In a similar fashion, Melki et al., 1994, have observed "a heterozygotedeficiency" consisting of a reduced number of bands for the C272 CTR inType I SMA, reflecting, they propose, chromosomal deletions. DiDonato etal., (1994) have also seen a striking reduction in the number of AG1 CTRsub-loci in Type I SMA individuals when compared with non-SMAindividuals. We believe that the observation by three groups of thedepletion of these intraNAIP markers on Type I SMA chromosomes fits wellwith the proposed model of a lack or absence of both the intact and exon5-6 deleted form of the NAIP gene underlying the disease.

(iv) The markedly increased frequency of NAIP exon 5-6 deletionsobserved in SMA chromosomes (approximately 67% of type 1 SMA chromosomesand 42% of type 2/3 SMA chromosomes) compared with that detected fornon-SMA chromosomes (2-3%). As outlined above, we believe that thisphenomenon reflects the rarity or absence of both the intact NAIP geneas well as the NAIP version with only exons 5 through 6 deleted in theSMA chromosomes, leaving only the more significantly internally deletedand truncated forms of the NAIP gene present.

(v) Our consistent inability to RT-PCR amplify appropriate sizetranscripts from RNA obtained from 11 of 12 SMA individuals despitesuccess with 12 of 12 RNAs from non-SMA individuals. Furthermore,sequencing of those RT-PCR products that could be obtained from type 1SMA material revealed a variety of mutations and deletions.

(vi) The presence of a variable number of copies of truncated andinternally deleted versions of the NAIP gene is similar to the situationreported in the autosomal dominant polycystic kidney disease gene(ADPKD, European Polycystic Kidney Disease Consortium, 1994). In thiscase portions of unprocessed pseudogenes corresponding to the causativegene were found to map elsewhere on chromosome 16p. The key difference,is that with the NAIP locus the mutated form of the gene is amplified.

In this regard the NAIP region of 5q13.1 has more similarity to the areaof chromosome 6 containing CYP21, the gene that encodes steroid21-hydroxylase (Wedell and Luthman, 1993). CYP21, which when mutatedcauses an autosomal recessive 21-hydroxylase deficiency, has beenobserved in 0-3 copies in individuals. There also exists in the region avariable number of inactive pseudogene copies of CYP21 knowncollectively as CYP21P. The majority of the CYP21 mutations that havebeen observed in 21-hydroxylase deficiency can also be found in someform of CYP21P and it is thought that the pseudogenes act as a source ofthe mutations observed in CYP21. The truncated and internally deletedNAIP genes are analogous to CYP21P only instead of the gene conversionpostulated for CYP21/CYP21P it is possible that unequal crossing overresults in chromosomes deleted for forms of the NAIP gene that encodefunctional protein. The existence of a polymorphic number of mutatedNAIP genes on 5q13.1 is a credible mechanism for generation of SMAchromosomes in this fashion.

Baculoviral IAPs

NAIP shows significant homology with the two baculoviral gene products,CpIAP and OpIAP, that are capable of inhibiting insect cell apoptosis(Table 4). Insect cell apoptosis following baculoviral infection hasbeen well documented and is postulated to be a defence mechanism.Premature death of infected insect cells result in an attenuation ofviral replication (Clem and Miller, 1994a). CpIAP and OpIAP are thoughtto represent baculoviral responses to this apoptotic mechanism. Both actindependently of other viral proteins to inhibit host insect cellapoptosis, thereby permitting increased viral proliferation (Clem andMiller, 1994a, 1994b). They are known to be strongly similar only toeach other; until now no sequences similarities with cross phylaproteins have been reported. Their mode of action is unknown, althoughsome interaction with DNA has been postulated.

The role and cellular localization of NAIP has not yet been established.However, we believe that the significant sequence similarity betweenNAIP and the baculoviral IAPs, especially over such a considerablephylogenic distance, combined with the previously postulated role ofinappropriate apoptosis in the pathogenesis of SMA make it likely thatNAIP serves as an apoptosis inhibitor in the motor neuron. Transfectionassays employing NAIP both in insect and mammalian neuronal cells willhelp in this regard.

One possibility is that specific ligand binding of the carboxy terminusof the NAIP activates the GTP binding site which in turn activates theIAP domain. The survival of a motor neuron might, therefore, bedependent on the presence of the ligand(s): should the concentrationdrop below a critical threshold, the IAP domains cease to function withensuing cell death. This represents a possible mechanism for the naturalwinnowing of motor neurons observed in embryogenesis. The source of theligand might be postulated to be either muscle cells or Schwann cells.The embryogenesis of motor neurons might, therefore, be viewed as acompetition between the cells with only those that make sufficientcontacts to maintain the NAIP occupancy rate surviving.

If, as postulated, NAIP does inhibit apoptosis, it is unclear whetherNAIP is a constituent of a previously uncharacterized mammalianapoptotic pathway or a (presumably) upstream component of the pathwayinvolving the human inhibitor of apoptosis, Bc1-2 (Vaux et al., 1988;Hockenberry et al., 1990; Garcia et al., 1992). Assays employingapoptosis inhibition deficient baculoviral strains have revealed thatBc1-2 does not complement the deficiency in such assays (Clem andMiller, 1994b). If NAIP is a functional homolog of the baculoviral IAPs,then this observation might suggest a role in a previouslyuncharacterized eucaryotic apoptotic pathway. One possibility is thatNAIP represents an intersection of a novel apoptotic mechanism with theneurotrophic cytokine, ciliary neurotrophic factor (CNTF, Raff et al.,1993; Meakin and Shooter, 1993) or one of the downstream components ofthis pathway (Stahl et al., 1994). CNTF null mice show a pathologicpicture that is similar to that of SMA with normal development of theneurons initially followed by their progressive apoptotic depletion(Masu et al., 1993). Moreover, although deprivation of neurotrophinsunder the right conditions may result in apoptosis of cultured neurons,it is noteworthy that CNTF is alone among these agents in not havingsuch apoptosis rescued by Bc1-2. This finding led the workers who madethe observation to suggest the presence of a second eucaryotic apoptoticpathway (Allsopp et al., 1993). The existence of such distinct pathwaysmay underlie the synergistic effect observed with the marked retardationof motor neuron loss in the wobbler mouse mutant following treatmentwith brain derived neurotrophic factor (BDNF) and CNTF (Mitsumoto etal., 1994).

The role of the lipid attachment site in NAIP is unknown. Similar siteshave been known to serve as procaryotic protein leader sequences usuallysituated in the protein's amino terminus. We have detected the consensuspattern in 218 human sequences in the Swiss-Protein Database (release28). These sequences are present in a variety of functional settings;transmembrane regions, signal sequences, extracellular and cytoplasmicdomains. One possibility is that the lipoprotein attachment site isextracellular and binds a constituent of the Schwann cell proteolipid ina manner that has been postulated for the apoptosis inhibitinginteraction of integrin with the extracellular matrix (Meredith et al.,1993; Frisch and Francis, 1994). Furthermore, the site may play a moreactive role in the hepatic form of the NAIP that we have observed onNorthern blot analysis. It is noteworthy that serum fatty acidabnormalities have been detected in children with SMA (Kelley andSladky, 1986).

The identified region of 5q13.1 contains, in addition to the NAIP gene,a variable number of copies of internally deleted and truncated forms ofthe gene. We believe that a lack or absence of both the intact NAIP geneand the NAIP locus with exons 5 and 6 deleted from a given individual'sgenome are likely to cause SMA. In this regard, the identification ofNAIP has allowed us to develop accurate molecular based diagnoses of SMAas well as directing the formulation of conventional and genetictherapies for these debilitating conditions. Furthermore, theidentification of genes showing homology with the NAIPlocus and proteinsthat interact with NAIP may help in the continuing elucidation ofapoptotic mechanisms in mammalian cells.

EXAMPLES

Family Material

Clinical diagnoses conducted as described in MacKenzie et al. (1993)with all patients fulfilling the diagnostic criteria given therein. DNAwas isolated from peripheral leukocytes as described (MacKenzie et al.,1993).

Genetic and Linkage Disequilibrium Analyses

Genotyping with microsattelite markers was as outlined in MacKenzie etal. (1993) and McLean et al. (1994). The following 5q13.1 loci were usedas described: D5S112 (Brzustowitcz et al., 1990), D5S351 (Hudson et al.,1992), D5S435 (Soares et al., 1993), D5S557 (Francis et al., 1993),D5S629 and D5S637 (Clermont et al., 1994), D5S684 (Brahe et al, 1994),Y98T, Y97T, Y116T, Y122T and CMS (Kleyn et al., 1993), CATT (Burghes etal., 1994, McLean et al., 1994) and MAP1B (Lien et al., 1991).

Linkage disequilibrium analyses were conducted using parameters that canaccommodate the multiple alleles seen with microsatellite repeats. Giventhe complexities inherent in disequilibrium analyses, a total of 4different parameters for which multiple alleles may be used wereemployed. These were Dij, Dij' and D' as defined in Hedrick (1987) aswell as the chi square test. Two of these, Dij and Dij' have given thebest a posteriori positional information in a previous study on myotonicdystrophy (Podolsky et al., 1994). The patient and control population isas outlined in McLean et al. (1994).

Cosmid, YAC and PAC Arraying

Cosmid and YAC contig assembly was as outlined in Roy et al. (1994).PACs were constructed as outlined in Ioannou et al. (1994). Using theseprocedures three PAC libraries have been constructed with a combinedtotal of 175,000 clones and propagated as individual clones inmicrotiter dishes (Ioannou et al., unpublished results). Pools derivedfrom the three libraries (designated LLNL PAC1, RPCI1 and RPCI2) werescreened with 5q13.1 STS's. Positive PACs were arranged into acontiguous and overlapping arrays by further analysis with additionalSTSs combined with probings of Southern blots containing PAC DNA bysingle copy genomic DNA and cDNA probes.

DNA Manipulation and Analysis

Four genomic libraries containing PAC 125D9 insert were constructed byBamHI, BamHI/NotI, total and partial Sau3aI (selected for 5 kb insertsize) digestions of the PAC genomic DNA insert and subcloned intoBluescript vector. Sequencing of approximately 400 bp of both termini of200 five kb clones from the partial Sau3AI digestion library in themanner of Chen et al. (1993) was undertaken.

Coding sequences from the PACs were isolated by the exon amplificationprocedure as described by Church et al. (1994). PACs were digested withBamHI or BamHI and BglII and subcloned into pSPL3. Pooled clones of eachPAC were transfected into COS-1 cells. After a 24 h transfection totalRNA was extracted. Exons were cloned into pAMP10 (Gibco, BRL) andsequenced utilizing primer SD2 (GTG AAC TGC ACT GTG ACA AGC TGC).

DNA sequencing was conducted on an ABI 373A automated DNA sequencer. Twocommercial human fetal brain cDNA libraries in lambda gt (Stratagene)and lambda ZAP (Clontech) were used for candidate transcript isolation.The Northern blot was commercially acquired (Clontech) and probing wasperformed using standard methodology.

In general, primers used in the paper for PCR were selected for T_(m) sof 60° C. and can be used with the following conditions: 30 cycles of94° C., 60s; 60° C., 60s;

72° C., 90s. PCR primer mappings are as referred to in the figurelegends and text. Primer sequences are as follows:

    __________________________________________________________________________    1258 ATg CTT ggA TCT CTA gAA Tgg - SEQ ID NO: 3                               1285 AgC AAA gAC ATg Tgg Cgg AA - SEQ ID NO: 4                                1343       CCA gCT CCT AgA gAA AgA Agg A - SEQ ID NO: 5                       1844       gAA CTA Cgg CTg gAC TCT TTT - SEQ ID NO: 6                         1863       CTC TCA gCC TgC TCT TCA gAT - SEQ ID NO: 7                         1864       AAA gCC TCT gAC gAg Agg ATC - SEQ ID NO: 8                         1884       CgA CTg CCT gTT CAT CTA CgA - SEQ ID NO: 9                         1886       TTT gTT CTC CAg CCA CAT ACT - SEQ ID NO: 10                        1887       CAT TTg gCA TgT TCC TTC CAA g - SEQ ID NO: 11                      1893       gTA gAT gAA TAC TgA TgT TTC ATA ATT - SEQ ID NO: 12                1910       TgC CAC TgC CAg gCA ATC TAA - SEQ ID NO: 13                        1919       TAA ACA ggA CAC ggT ACA gTg - SEQ ID NO: 14                        1923       CAT gTT TTA AgT CTC ggT gCT CTg - SEQ ID NO: 15                    1926       TTA gCC AgA TgT gTT ggC ACA Tg - SEQ ID NO: 16                     1927       gAT TCT ATg TgA TAg gCA gCC A - SEQ ID NO: 17                      1933       gCC ACT gCT CCC gAT ggA TTA - SEQ ID NO: 18                        1974       gCT CTC AgC TgC TCA TTC AgA T - SEQ ID NO: 19                      1979       ACA AAg TTC ACC ACg gCT CTg - SEQ ID NO: 20                        __________________________________________________________________________

RT-PCR

cDNA was synthesized in a 20 μl reaction utilizing 7 μg of total RNA.The RNA was denatured for 5 minutes at 95° C. and cooled to 37° C.Reverse transcription was performed at 42° C. for 1 hour after additionof 5 μl 5× reverse transtriction buffer, 2 μl 0.1 M DTT, 41 2.5 mMdNTPs, 8 units RNasin, 25 ng cDNA primer (1285) and 400 units of MMLV(Gibco, BRL). 1 μl of cDNA was utilized as template in subsequent 50 μlPCR reactions. 1 μl of this primary PCR was utilized as template forsecondary PCR amplifications.

Sequence Analysis

Primary DNA sequence data was edited with the TED program (Gleeson andHillier, 1991). As many of the partially sequenced 200 five kb clonesfrom the partial Sau3AI digestion library as possible were arranged intooverlapping arrays using the XBAP Staden package (Dear and Staden,1991). Sequence data was also assembled and analyzed using the GCGSequence analysis (Genetics computer group, 1991). Protein domainhomologies were found by searching the Prosite Protein database (Bairochand Bucher, 1993). The MEMSAT program was also used to search fortransmembrane domain regions (Jones et al., 1994).

                  TABLE 1                                                         ______________________________________                                        The YACs isolated in this study, their size and library of origin are         listed. NCE: National Centers of Excellence, Toronto, Ontario, Canada.        ICRF: Imperial Cancer Research Fund, CEPH: Centre d'Etude du                  Polymorphisme Humaine.                                                        YAC            SIZE        LIBRARY                                            ______________________________________                                        12H1           560 kb      NCE                                                12H4           270 kb      NCE                                                24D6           750 kb      NCE                                                27H5           630 kb      NCE                                                33H10          1.3 Mb      NCE                                                H0416          390 kb      ICRF                                               E0320          440 kb      ICRF                                               G1138          850 kb      ICRF                                               A0848          350 kb      ICRF                                               D06100         580 kb      ICRF                                               D0981          450 kb      ICRF                                               919C2          800 kb      CEPH                                               755B12         1 Mb        CEPH                                               754H5          500 kb      CEPH                                               ______________________________________                                    

                                      TABLE 2                                     __________________________________________________________________________            SOURCE/              SOURCE/                                          PROBE   REFERENCE    PROBE   REFERENCE                                        __________________________________________________________________________    YD33    STS developed from, Alu-                                                                   Y13.1   STS developed                                            5'-trp PCR product of                                                                              from inter Alu-5'                                        YAC D06100           PCR product of                                                                YAC 12H1(this                                                                 study)                                           Y14.1   STS developed from Alu-                                                                    Y15.1   STS developed                                            3'-ura PCR product of                                                                              from Alu-5'-ura                                          YAC 12H4 (this study)                                                                              PCR product of                                                                YAC 12H4 (this                                                                study)                                           Y9.2    STS developed from inter-                                                                  Y5.6    STS developed fron                                       Alu-5' PCR product of                                                                              inter-Alu-3'PCR                                          YAC 27H5 (this study)                                                                              product of YAC                                                                24D6 (this study)                                Y11.2   STS developed from Alu-                                                                    pZY8    subcloned 1.3 kb                                         3'-trp PCR product of                                                                              HindIII fragment                                         YAC 33H10 (this study)                                                                             from cosmid 250B6                                                             (this study)                                     H7T733  Alu 33-T7 PCR product                                                                      p151.2  subcloned 1.2 kb                                         from cosmid 1H7 (this                                                                              inter-Alu PCR                                            study)               product of cosmid                                                             15F8 (this study)                                G10T333 Alu 33-T3 PCR product of                                                                   p402.1  subcloned 2.1 kb                                         cosmid IG10 (this study)                                                                           Bam HI/HindIII                                                                fragment of cosmid                                                            40GI (this study)                                G3T733  Alu 33-T3 PCR product of                                                                   pL7     liver transcript                                         cosmid IG3 (this study)                                                                            isolated with                                                                 subcloned 1.1 kb                                                              BamHI/SalI                                                                    fragment from                                                                 58G12 (this study)                               p2281.8 subcloned 1.8 kb HindIII                                                                   F933    inter-Alu PCR                                            fragment of cosmid 228C8                                                                           product of cosmid                                        (this study)         1F9 (this study)                                 pGA1    fetal brain transcript                                                                     β- (Oshima et al.                                           isolated with cosmid 250B6                                                                 glucuronidase                                                                         1987)                                            MAP1B   (Lien et al. 1991)                                                                         Y122T   (Kleyn et al., 1993)                             D5S351  (Yaraghi et al., in press)                                                                 CMS-1   (Kleyn et al., 1993)                             D5S557  (Francis et al, 1993)                                                                      Y98T    (Kleyn et al., 1993)                             D5S112  (Brzustowitcz et al., 1990)                                                                Y97T    (Kleyn et al., 1993)                             Y112U   (Kleyn et al., 1993)                                                                       Y88T    (Kleyn et al, 1993)                              Y119T   (Kleyn et al, 1993)                                                                        Y116U   (Kleyn et al., 1993)                             CATT-1  (Burghes et al., 1994;                                                                     Y55U    (Kleyn et al., 1993)                                     McLean et al., in press)                                              D5S127  (Sherrington et al., 1991)                                                                 Y38T    (Kleyn et al., 1993)                             D5S435  (Soares et al., 1993)                                                                      D5S125  (Hudson et al.,                                                               1992)                                            Y107U   (Kleyn et al., 1993)                                                                       Y97U    (Kleyn et al., 1993)                             D5F149 (C212)                                                                         (Melki et al., 1994)                                                                       D5F151  (Melki et al., 1994)                                                  (C171)                                                   D5F150 (C272)                                                                         (Melki et al., 1994)                                                                       D5F153  (Melki et al., 1994)                                                  (C161)                                                   D5S637  (Clermont et al., 1994)                                                                    D5S629  (Clermont et al.,                                                             1994)                                            __________________________________________________________________________

                                      TABLE 3                                     __________________________________________________________________________    The homology of the GA1 component of neuronal apoptosis inhibitor protein     gene                                                                          (SEQ ID NO: 23) compared for homology with the inhibitor apoptosis poly-      peptides of the viruses Cydia pomenella (SEQ ID NO: 21) and Orgyia            pseudotsugata (SEQ ID NO: 22).                                                __________________________________________________________________________                        1                                                50       Cydia pomonella     .......... .......... .......... .......... ..........    Orgyia pseudots     .......... .......... .......... .......... ..........    cGA1-concensus      TRTVDKPQKM ATQQKASDER ISQFDHNLLP ELSALLGLDA VQLAKELEEE                        51                                               100      Cydia pomonella     .......... .......... .......... .......... ..........    Orgyia pseudots     .......... .......... .......... .......... ..........    cGA1-concensus      EQKERAKMQK GYNSQMRSEA KRLKTFVTYE PYSSWIPQEM AAAGFYFTGV                        101                                              150      Cydia pomonella     .......... .......... .......... .......... ..........    Orgyia pseudots     .......... .......... .......... .......... ........MS    cGA1-concensus      KSGIQCFCCS LILFGAGLTR LPIEDHKRFH PDCGFLLNKD VGNIAKYDIR                        151                                              200      Cydia pomonella     .........M SDLR..LEEV RLNTFEKWP. .VSFLSPETM AKNGFYYLGR    Orgyia pseudots     SRAIGAPQEG ADMK..NKAA RLGTYTNWP. .VQFLEPSRM AASGFYYLGR    cGA1-concensus      VKNLKSRLRG GKMRYQEEEA RLASFRNWPF YVQGISPCVL SEAGFVFTGK                        201                                              250      Cydia pomonella     SDEVRCAFCK VEIMRWKEGE DPAADHKKWA PQCPFVKGID VCGSI.....    Orgyia pseudots     GDEVRCAFCK VEITNWVRGD DPETDHKRWA PQCPFVRN.. ..........    cGA1-concensus      QDTVQCFSCG GCLGNWEEGD DPWKEHAKWF PKCEFLRSKK SSEEITQYIQ                        251                                              300      Cydia pomonella     .......VTT NNIQNTTTHD TIIGPA.... HPKYAHEAAR VKSFHNWPRC    Orgyia pseudots     ........NA HDTPHDRAPP ARSAAA.... HPQYATEAAR LRTFAEWPRG    cGA1-concensus      SYKGFVDITG EHFVNSWVQR ELPMASAYCN DSIFAYEELR LDSFKDWPRE                        301                                              350      Cydia pomonella     MKQRPEQMAD AGFFYTGYGD NTKCFYCDGG LKDWEPEDVP WEQHVRWFDR    Orgyia pseudots     LKQRPEELAE AGFFYTGQGD KTRCFCCDGG LKDWEPDDAP WQQHARWYDR    cGA1-concensus      SAVGVAALAK AGLFYTGIKD IVQCFSCGGC LEKWQEGDDP LDDHTRCFPN                        351                                              400      Cydia pomonella     CAYVQLVKGR DYVQKVI... TEACVLPGEN TTVSTAAPVS EPIPETKIEK    Orgyia pseudots     CEYVLLVKGR DFVQRVM... TEACVVRDAD N......... ....EPHIER    cGA1-concensus      CPFLQNMKSS AEVTPDLQSR GELCELLETT SESNLEDSIA VGPIVPEMAQ                        401                                              450      Cydia pomonella     .......EPQ VEDSKLCKIC YVEE...... .......CIV CFVPCGHVVA    Orgyia pseudots     PAV....EAE VADDRLCKIC LGAE...... .......KTV CFVPCGHVVA    cGA1-concensus      GEAQWFQEAK NLNEQLRAAY TSASFRHMSL LDISSDLATD HLLGCDLSIA                        451                                              500      Cydia pomonella     CAKCALSVDK CPMCRKIVTS VLKVYFS... .......... ..........    Orgyia pseudots     CGKCAAGVTT CPVCRGQLDK AVRMYQV... .......... ..........    cGA1-concensus      SKHISKPVQE PLVLPEVFGN LNSVMCVEGE AGSGKTVLLK KIAFLWASGC                        501                                              550      cGA1-concensus      CPLLNRFQLV FYLSLSSTRP DEGLASIICD QLLEKEGSVT EMCMRNIIQQ    cGA1-concensus      LKNQVLFLLD DYKEICSIPQ VIGKLIQKNH LSRTCLLIAV RTNRARDIRR    cGA1-concensus      VLETILEIQA FPFYNTVCIL RKLFSHNMTR LRKFMVYFGK NQSLQKIQKT    cGA1-concensus      PLFVAAICAH WFQYPFDPSF DDVAVFKSYM ERLSLRNKAT AEILKATVSS    cGA1-concensus      CGELALKGFF SCCFEFNDDD LAEAGVDEDE DLTMCLMSKF TAQRLRPFYR    cGA1-concensus      FLSPAFQEFL AGMRLIELLD SDRQEHQDLG LYHLKQINSP MMTVSAYNNF    cGA1-concensus      LNYVSSLPST KAGPKIVSHL LHLVDNKESL ENISENDDYL KHQPEISLQM    cGA1-concensus      QLLRGLWQIC PQAYFSMVSE HLLVLALKTA YQSNTVAACS PFVLQFLQGR    cGA1-concensus      TLTLGALNLQ YFFDHPESLS LLRSIHFSIR GNKTSPRAHF SVLETCFDKS    cGA1-concensus      QVPTIDQDYA SAFEPMNEWE RNLAEKEDNV KSYMDMQRRA SPDLSTGYWK    cGA1-concensus      LSPKQYKIPC LEVDVNDIDV VGQDMLEILM TVFSASQRIE LHLNHSRGFI    cGA1-concensus      ESIRPALELS KASVTKCSIS KLELSAAEQE LLLTLPSLES LEVSGTIQSQ    cGA1-concensus      DQIFPNLDKF LCLKELSVDL EGNINVFSVI PEEFPNFHHM EKLLIQISAE    cGA1-concensus      S                                                         __________________________________________________________________________

    TABLE 4       - Complete cDNA sequence and predicted amino acid sequence SEQ ID NO:     2 of NAIP. Nucleotide residues G are in lower case       "g" to clearly delineate the residue C. Exon boundaries are as marked.       Arrows underline the perfect 90 nucleotide tandem repeat       in the 5' UTR region. The deleted exons 5 and 6 are shaded in gray.     The regions of intra-protein repeated amino acid homology       in the IAP domain are underlined. Sequence comparison with baculovirus       IAPs is shown, identical residues are shaded dark grey       and similar residues are shaded light grey. CpIAP and OpIAP refer to     the IAPs encoded by the baculoviruses, Cydia pomonella       granulosis virus (CpGV) and orygia pseudotsugata nuclear polyhydosis     virus (OpMNPV), respectively. The sequence       comparison has been extended to the cysteine/histidine putative DNA     interacting region of the baculoviral IAPs, no NAIP       homology can be seen. The region showing significant similarity to     chicken occludin is shown in grey in the 3'      UTR.      ##STR1##      ##STR2##      ##STR3##      ##STR4##

REFERENCES

1. Albertson, H. M., Abderrahim, H., Cann, H. M., Dausset, J., lePaslier, D. and Cohen, D. (1990). Construction and characterization of ayeast artificial chromosome library containing seven haploid genomeequivalents. Proc. Nat Acad. Sci. USA. 87: 4256-4260.

2. Allsopp., T. E. Wyatt, S., Paterson, H. F., and Davies, A. M. (1993).The Proto-Oncogene bc1-2 Can Selectively Rescue NeutrophicFactor-Dependent Neurons from Apoptosis. Cell 3073, 295-307.

3. Bairoch, A., and Bucher, P., (1994) PROSITE: Recent Developments.Nucl. Acids Res. 22:3583-3589.

4. Birnbaum, M. J., Clem, R. J., and Miller, L. K. (1994). Anapoptosis-inhibiting gene from a nuclear polyhedrosis virus encoding apolypeptide with Cys-His sequence Motifs. J. Virol. 68, 2521-2528.

5. Birnboim, H. C.,and Doly, J. (1979). A rapid alkaline extractionprocedure for screening recombinanat plasmid DNA. Nucleic Acids. Res. 7:1513-1523.

6. Boultwood, J., Fidler, C., Lewis, S., Kelly, S., Sheridan, H.,Littlewood, T. J., Buckle, V. J. and Wainscoat, J. S. (1994). Molecularmapping of uncharacteristically small 5q deletions in two patients withthe 5q- syndrome: delineation of the critical region on Sq andidentification of a 5q- breakpoint. Genomics 19: 425-432,

7. Burghes, A. H. M., Ingraham, S. E., McLean, M., Thompson, T. G.,McPherson, J. D., Kote-Jarai, Z., Carpten, J. D., DiDonato, C. J.,Ikeda, J-E., Surh, L., Wirth, B., Sargent, C. A., Ferguson-Smith, M. A.,Fuerst, P., Moysis, R. K., Grady, D. L., Zerres, K., Korneluk, R.,MacKenzie, A. and Wasmuth, J. J. (1994). A multicopy dinucleotide markerthat maps close to the spinal muscular atrophy gene. Genomics 21:394-402.

8. Brzustowitcz, L. M., Lehner, T., Castilla, L. H., Penchaszadeh, G.K., Wilhelmsen, K. C., Daniels, R., Davies, K. E., Leppert, M., Ziter,F., Wood, D., Dubowitz, V., Zerres, K., Hausmanowa-Petrusewicz, I., Ott,J., Munsat, T. L. and Gilliam, T. C. (1990). Genetic mapping of chronicchildhood-onset spinal muscular atrophy to chromosome 5q11.2-13.3.Nature 344: 540-541.

9. Church, D. M., Stotler, C. J., Rutter, J. L., Murrell, J. R.,Trofatter, J. A., and Buckler, A. J. (1994). Isolation of genes fromcomplex sources of mammalian genomic DNA using exon amplification.Nature Genet. 6, 98-105.

10. Clem, R. J., and Miller, L. K. (1994a). Induction and Inhibition ofApoptosis by Insect Viruses. Apoptosis II: The Molecular Basis ofApoptosis in Disease, Cold Spring Harbour Laboratory Press pp. 89-110.

11. Clermont, O., Burlet, P., Burglen, L., Lefebvre, S., Pascal, F.,McPherson, J., Wasmuth, J., Cohen, D., Le Paslier, D., Weissenbach, J.,Lathrop, M., Munnich, A., and Melki, J. (1994). Use of genetic andphysical mapping to locate the spinal muscular atrophy locus between twonew highly polymorphic DNA markers. Am. J Hum, Genet. 54: 687-694.

12. Dana, S., and Wasmuth, J. J. (1982). Linkage of the leuS, emtB, andchr genes on chromosome 5 in humans and expression of human genesencoding protein synthesis components in human-Chinese hamster hybrids.Somatic Cell Genet. 8: 245.

13. Davis, L. G., Dibner, M. D. and Battey, J. F. (1986). Basic Methodsin Molecular Biology. Elsevier, New York.

14. Dear, S., and Straden, R. A. (1991). A Sequence Assembly and EditingProgram for Efficient Management of Large Projects. Nucl. Acids. Res.19, 3907-3911.

15. DiDonato, C. J., Morgan, J., Carpten, J. D., Fuerst, P., Ingraham,S. E., Prescott, McPherson, J. D., Wirth, B., Zerres, K., Hurko, O.,Wasmuth, i.i., Mendell, J. R., and Burghes, A. H. M. (1994). Associationbetween Agl-CA Alleles and Severity of Autosomal Recessive ProximalSpinal Muscular Atrophy. Am. J. Hum. Genet. (in press).

16. Dubowitz, V. (1978). Muscle Disorders in Childhood, W. B. SaundersCo. Ltd., East Sussex, pp. 146-190.

17. Dubowitz, V, (1991). Chaos in classification of the spinal muscularatrophies of childhood. Neuromusc. Disord. 1: 47-53.

18. Feinberg, A. P. and Vogelstein, B. (1983). A technique forradiolabeling DNA restriction endonuclease fragments to high specificactivity, Anal. Biochem. 132: 6-13.

19. Francis, M. J., Morrison, K. E., Campbell, L., Grewal, P. K.,Christodoulo, Z., Daniels, R. J., Monaco, A. P., Frichauf, A. M.,McPherson, J., Wasmuth, J. and Davies, K. E. (1993). A contig ofnon-chimeric YACs containing the spinal muscular atrophy gene in 5q13.Hum. "Mol. Genet. 2: 1161-1167.

20. Frisch, S. M., and Francis, H., (1994) Disruption of EpithelialCell-Matrix Interactions Induces Apoptosis. J. Cell Biol., 124, 619-626.

21. Garcia I. Martinou I Tsujimoto, Y. Martinou J. C. (1992) Preventionof Programmed Cell Death of Sympathetic Neuron by the bc1-2proto-oncogene. Science 258:302-304.

22. Gilliam, T. C., Freimer, N. B., Kaufmann, C. A., Powhik, P. P.,Bassett, A.S., Bengtsson, U., and Wasmuth, J. J. (1989). Deletionmapping of DNA markers to a region of chromosome 5 that cosegregateswith schizophrenia. Genomics 5: 940-944.

23. Gilliam, T. C., Brzustowitcz, L. M., Castillo, L. H., Lehner, T.,Penchaszadeh, G. K., Daniels, R. J., Byth, B, C., Knowles, J., Hislop,J. E., Shapira, Y., Dubowitz, V., Munsat, T. L., Ott, J. and Davies, K.E. (1990). Genetic homogeneity between acute and chronic forms of spinalmuscular atrophy. Nature 345: 823-825.

24. Gleeson, R., and Hillier, L. (1991). A Trace Display and EditingProgram for Data from Fluorescence Based Sequencing Machines. Nucl.Acids. Res. 19, 6491-643, 47.

25. Hedrick, P. W. (1987). Gametic Disequilibrum Measures: Proceed withCaution. Genetics 117, 331-341.

26. Hockenberry, D., Nunez, G., Millman, C., Schreiber, R. D., andKorsmeyer, S. J. (1990) Bc1-2 is an Inner Mitochondrial Membrane ThatBlocks Programmed Cell Death. Nature 348, 334-336.

27. Hudson, T. J., Englestein, M., Lee, M. K., Ho, E. C., Rubenfield, M.J., Adams, C. P., Housman, D. E., and Dracopoli, N. C. (1992). Isolationand chromosomal assignment of 100 highly informative human simplesequence repeat polymorphisms. Genomics 13: 622-629.

28. Ioannou, P. A., Amemiya, C. T., Garnes, J., Droisel, P. M., Shizuya,H., Chen, C., Batzer, M. A., de Jong, P. J. (1994). A New BacteriophagePi-derived Vector for the Propogation of Large Human DNA Fragements.Nature Genet. 6, 84-89.

29. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1994). A ModelRecognition Approach to the Prediction of All-Helical Membrane ProteinStructure and Topology. Biochemistry 33, 3038-3049.

30. Kleyn, P. W., Wang, C. H., Lien, L. L., Vitale, E., Pan, J., Ross,B. M., Grunn, A., Palmer, D. A., Warburton, D., Brzustowicz, L. M.,Kunkel, L. M. and Gilliam, T. C. (1993). Construction of a yeastartificial chromosome contig spanning the spinal muscular atrophydisease gene region. Proc. NaH Acad. Sci. USA 90: 6801-6805.

31. Kouprina, N., Eldarov, M., Moyzis, R., Resnick, M. and Larionov, V.(1994). A model system to assess the integrity of mammalian YACs duringtransformation and propagation in yeast. Genomics 21: 7-17.

32. Larin, Z., Monaco, A. P. and Lehrach, H. (1991). Yeast artificialchromosome libraries containing large inserts from mouse and human DNA.Proc. Natl. Acad Sci USA 87: 4123-4127.

33. Lien, L. L., Boyce, F. M., Kleyn, P., Brzustowicz, L. M., Menninger,J., Ward, D. C., Gilliam, T. C., and Kunkel, L. M. (1991). Mapping ofhuman microtubule associated protein 1B in proximity to the spinalmuscular atrophy locus at 5q13.1 Proc. Natl. Acad Sci. 88: 7873-7876.

34. MacKenzie, A., Roy, N., Besner, A., Mettler, G., Jacob, P.,Korneluk, R. and Surh, L. (1993). Genetic linkage analysis of Canadianspinal muscular atrophy kindreds using flanking microsatellite 5q13polymorphisms. Hum. Gen. 90: 501-504.

35. Mankoo, B. S., Sherrington, R., De La Concha, A., Kalsi, G., Curtis,D., Melmer, G. and Gurling, H. M. D. (1991). Two microsatellitepolymorphisms at the D5S39 locus. Nucleic Acids Res. 19: 1963.

36. McLean, M. D., Roy, N., MacKenzie, A. E., Salih, M., Burghes, A.,Simard, L., Korneluk, R. G., Ikeda, J-E, and Surh, L. Two 5q13 simpletandem repeat loci are in linkage disequilibrium with type I spinalmuscular atrophy. Hum. Mol. Genet., In Press.

37. McLean M., Roy, N., Yaragi, G., Shutler, K., Tamai, Mahadevan S.,Salih, M., Besner, A., Lefebvre, C., Kang, X., Aubry, H., Baird, S.,Surh, L., Korneluk, R., MacKenzie, A. and Ikeda, J-E. 1994 Moleculargenetics analysis of the spinal muscular atrophy region on 5Q13.1. IkedaGenoSPERE Project. 31-35.

38. Melki, J., Abdelhak, S., Sheth, P., Bachelot, M. F., Burlet, P.,Marcadet, A., Aicardi, J., Barois, A., Carriere, J. P., Fardeau, M.,Fontan, D., Ponsot, G., Billsette, T., Angeline, C., Barbosa, C.,Ferriere, G., Lanzi, G., Ottolini, A., Babron, M. C., Cohen, D.,Hanauer, A., Colerget-Darpox, F., Lathrop, M., (1993) Refined LinkageMap of Chromosome 5 in the Region of the Spinal Muscular Atrophy Gene,Genomics 15:521-541.

40. Meredith, J. E., Fazeli, B., and Schwartz, M. A. (1993). TheExtracellular Matrix as a Cell Survival Factor. Mol. Biol. Cell 4,953-961.

41. Munnich, A. and Frezal, J. (1990). Gene for chronic proximal spinalmuscular atrophies maps to chromosome 5q. Nature 344: 767-768.

42. Melki, J., Lefebvre, S., Burglen, L., Burlet, P., Clermont, O.,Millasseau, P., Reboullet, S., Benichou, B., Zevianai, M., LePaslier,D., Cohen, D., Weissenbach, J, and Munnich, A. (1994). De novo andinherited deletions of the 5q13 region in spinal muscular atrophies.Science 264: 1474-1477.

43. Neil, D. L., Villasante, A., Fisher, R. B., Vetrie, D., Cox, B. andTyler-Smith, C. (1990). Structural instability of human tandemlyrepeated DNA sequnces cloned in yeast artificial chromosome vectors.Nucleic Acid Res. 18: 1421-1428.

44. Nelson, D. L., Brownstein, B. H. (eds) (1993). YAC libraries: Ausers guide. W. H. Freeman and Company, New York pp. 86-89.

45. Meakin, S. O., and Shooter, E. M. (1992). The Nerve Growth FactorFamily of Receptors. TINS 9, 32.3-331.

46. Oppenheim, R. W. (1991). Cell death during development of thenervous system. Annu. Rev. Neurosci. 14, 5453-501.

47. Oshima, A., Kyle, J. W., Miller, R. D., Hoffmann, Powell, P. P.,Grubb, J. H., Sly, W. S., Tropak, M., Guise, K. S., and Gravel, R. A.(1987). Cloning, sequencing and expression of cDNA for humanbeta-glucuronidase. Proc. Natl. Acad. Sci. USA. 84: 685-689.

48. Podolsky, L., Tsilfidis, C., Baird, S., Korneluk, R. G., andMacKenzie, A. E. (1994). An Empiric Comparison of Linkage DisequilibriumParameters in Disease Gene Localization: The Myotonic DystrophyExperience. Am. J. Hum. Genet. 55, A932.

49. Raff, M. C., Barres, B. A., Burne, J. F., Coles, H. S., Ishizaki,Y., and Jacobson, M. D. (1993). Programmed Cell Death and the Control ofCell Survival: Lessons from the Nervous System. Science 262:695-698.

50. Roy, N., McLean, M., Johnston, A., Lefebvre, C., Salih M., Yaraghi,Z., Ikeda, J. E., Korneluk, R. G., MacKenzie, A. E. (1994) Refinedphysical map of the spinal muscular atrophy gene region at 5q13 based onYAC and cosmid contiguous arrays, Genomics, (submitted).

51. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). "MolecularCloning: A Laboratory Manual," 2nd ed., Cold Spring Harbor LaboratoryPress, Cold Spring Harbor

52. Sarnat, H. B. (1992) Cerebral dysgenesis: Embryology and clinicalexpression, Oxford University Press, p. 107.

53. Scherer, S. and Tsui, L.-C. (1991). Adolph, K. W., ed Cloning andanalysis of large DNA molecules. Advanced Techniques in ChromosomeResearch. Dekker, New York. pp.33-72.

54. Soares, V. M., L. M., Kleyn, P. W., Knowles, J. A., Palmer, D. A.,Asokan, S., Penchaszadeh, G.K., Munsat, T. L. and Gilliam, T. C. (1993).Refinement of the spinal muscular atrophy locus to the interval betweenDSS435 and MAPIB. Genomics 15: 365-371.

55. Sherrington, R., Melmer, G., Dixon, M., Curtis, D., Mankoo, B.,Kalsi, G. and Gurling, H. (1991). Linkage disequilibrium between twohighly polymorphic microsatellites. Ani. Hum. Genet. 49: 966-971.

56. Shutler, G., Korneluk, R. G., Tsilfidis, C., Mahadaven, M., Bailly,J., Smeets, H., Jansen, G., Wieringa, B., Lohman, F., Asanidis, C., andde Jong, P. J. (1992). Physical mapping and cloning of the proximalsegment of the Myotonic Dystrophy Gene Region. Genomics 13: 513-525.

57. Stahl, N., 3oulton, T. G., Farruggella, T., Ip. N.Y., Davis,Witthuhn, B. A., Quelle, F. W., Silvennoinen, O., Barbieri, G.,Pellegrini, S., Ihle, J. N., and Yancopoulos, G. D. (1994). Associationand Activation of Jak-Tyk Kinases by CNTF-LIF-OSM-IL-6 S ReceptorComponents. Science 263, 92-95.

58. Stallings, R. L., Doggett, N. A., Okumura, K. and Ward, D. C.(1992). Chromosome 16-specific repetitive DNA sequences that map tochromosomal regions known to undergo breakage/rearrangement in leukemiacells. Genomics 13: 332-338.

59. Tagle, D. A., Collins, F. S. (1992). An optimized Alu-PCR primerpair for human-specific amplification of YACs and somatic cell hybrids.Hum. Molec. Genet. 1: 121-122.

60. The European Polycystic Kidney Disease Consortium (1994). Thepolycystic kidney disease gene encodes a 14 kb transcript and lieswithin a duplicated region on chromosome 16. Cell 77: 881-894.

61. Thompson, T. G., Morrison, K. E., Kleyn, P., Bengtsson, U., Gilliam,T. C., Davies, K. E., Wasmuth, J. J. and McPherson, J. D. (1993). Highresolution physical map of the region surrounding the spinal muscularatrophy gene. Hum. Mol. Genet. 2: 1169-1176.

62. van der Steege, G., Cobben, J-M., Osinga, J., Schaffer, H., vanOmmen, G-J. B., and Buys, C. H. C. M. A sublocus of the multicopymicrosatellite marker CMS1 maps proximal to SMA as shown by recombinantanalysis. Genomics, submitted.

63. Wirth, B., Voosen, B., Rohrig, D., Knapp, M., Piechaczek, B.,Rudnik-Schoneborn. and Zerres, K. (1993). Fine mapping and narrowing ofthe genetic interval of the spinal muscular atrophy region by linkagestudies. Genomics 15: 113-118.

64. Wirth, B., Pick, E., Leuter, A., Dadze, A., Voosen, B., Knapp, M.,Piechaczak-Wappenschmidt, B., Rudnik-Schoneborn, S., Schonling, J., Cox,S., Spurr, N.K. and Zerres, K. (1994). Large linkage analysis in 100families with autosomal recessive spinal muscular atrophy (SMA) and 11CEPH families using 13 polymorphic loci in the region 5q11.2-q13.3.Genomics 20: 84-93.

65. Warrington, J. A., Bailey, S. K., Armstrong, E., Aprelikova, O.,Alitolo, K., Dolganov, G. M., Wilcox, A. S., Sikela, J. M., Wolfe, S.F., Lovett, M., and Wasmuth, J. J. (1992). A radiation hybrid map of 18growth factor, growth factor receptor, hormone receptor,neurotransmitter receptor genes on the distal region of the long arm ofchromosome 5. Genomics. 13: 803-808.

66. Yaraghi, Z., Kang, X, Ikeda, J-E. and MacKenzie, A. Hum. Mol.Genet., In Press.

67. Yaraghi, Z., McLean, M., Roy, N., Surh, L., Ikeda, J-E., andMacKenzie, A. E. A recombination event occuring within the two complex5q13.1 simple tandem repeat polymorphisms suggests a telomeric mappingof spinal muscular atrophy. Genomics. Submitted.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                  - -  - - <160> NUMBER OF SEQ ID NOS: 23                                       - - <210> SEQ ID NO 1                                                        <211> LENGTH: 5502                                                            <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 1                                                         - - ttccggctgg acgttgccct gtgtacctct tcgactgcct gttcatctac ga -             #cgaacccc     60                                                                 - - gggtattgac cccagacaac aatgccactt catattgcat gaagacaaaa gg -            #tcctgtgc    120                                                                 - - tcacctggga cccttctgga cgttgccctg tgttcctctt cgcctgcctg tt -            #catctacg    180                                                                 - - acgaaccccg ggtattgacc ccagacaaca atgccacttc atattgggga ct -            #tcgtctgg    240                                                                 - - gattccaagg tgcattcatt gcaaagttcc ttaaatattt tctcactgct tc -            #ctactaaa    300                                                                 - - ggacggacag agcatttgtt cttcagccac atactttcct tccactggcc ag -            #cattctcc    360                                                                 - - tctattagac tagaactgtg gataaacctc agaaaatggc cacccagcag aa -            #agcctctg    420                                                                 - - acgagaggat ctcccagttt gatcacaatt tgctgccaga gctgtctgct ct -            #tctgggcc    480                                                                 - - tagatgcagt tcagttggca aaggaactag aagaagagga gcagaaggag cg -            #agcaaaaa    540                                                                 - - tgcagaaagg ctacaactct caaatgcgca gtgaagcaaa aaggttaaag ac -            #ttttgtga    600                                                                 - - cttatgagcc gtacagctca tggataccac aggagatggc ggccgctggg tt -            #ttacttca    660                                                                 - - ctggggtaaa atctgggatt cagtgcttct gctgtagcct aatcctcttt gg -            #tgccggcc    720                                                                 - - tcacgagact ccccatagaa gaccacaaga ggtttcatcc agattgtggg tt -            #ccttttga    780                                                                 - - acaaggatgt tggtaacatt gccaagtacg acataagggt gaagaatctg aa -            #gagcaggc    840                                                                 - - tgagaggagg taaaatgagg taccaagaag aggaggctag acttgcatcc tt -            #caggaact    900                                                                 - - ggccatttta tgtccaaggg atatcccctt gtgtgctctc agaggctggc tt -            #tgtcttta    960                                                                 - - caggtaaaca ggacacggta cagtgttttt cctgtggtgg atgtttagga aa -            #ttgggaag   1020                                                                 - - aaggagatga tccttggaag gaacatgcca aatggttccc caaatgtgaa tt -            #tcttcgga   1080                                                                 - - gtaagaaatc ctcagaggaa attacccagt atattcaaag ctacaaggga tt -            #tgttgaca   1140                                                                 - - taacgggaga acattttgtg aattcctggg tccagagaga attacctatg gc -            #atcagctt   1200                                                                 - - attgcaatga cagcatcttt gcttacgaag aactacggct ggactctttt aa -            #ggactggc   1260                                                                 - - cccgggaatc agctgtggga gttgcagcac tggccaaagc aggtcttttc ta -            #cacaggta   1320                                                                 - - taaaggacat cgtccagtgc ttttcctgtg gagggtgttt agagaaatgg ca -            #ggaaggtg   1380                                                                 - - atgacccatt agacgatcac accagatgtt ttcccaattg tccatttctc ca -            #aaatatga   1440                                                                 - - agtcctctgc ggaagtgact ccagaccttc agagccgtgg tgaactttgt ga -            #attactgg   1500                                                                 - - aaaccacaag tgaaagcaat cttgaagatt caatagcagt tggtcctata gt -            #gccagaaa   1560                                                                 - - tggcacaggg tgaagcccag tggtttcaag aggcaaagaa tctgaatgag ca -            #gctgagag   1620                                                                 - - cagcttatac cagcgccagt ttccgccaca tgtctttgct tgatatctct tc -            #cgatctgg   1680                                                                 - - ccacggacca cttgctgggc tgtgatctgt ctattgcttc aaaacacatc ag -            #caaacctg   1740                                                                 - - tgcaagaacc tctggtgctg cctgaggtct ttggcaactt gaactctgtc at -            #gtgtgtgg   1800                                                                 - - agggtgaagc tggaagtgga aagacggtcc tcctgaagaa aatagctttt ct -            #gtgggcat   1860                                                                 - - ctggatgctg tcccctgtta aacaggttcc agctggtttt ctacctctcc ct -            #tagttcca   1920                                                                 - - ccagaccaga cgaggggctg gccagtatca tctgtgacca gctcctagag aa -            #agaaggat   1980                                                                 - - ctgttactga aatgtgcatg aggaacatta tccagcagtt aaagaatcag gt -            #cttattcc   2040                                                                 - - ttttagatga ctacaaagaa atatgttcaa tccctcaagt cataggaaaa ct -            #gattcaaa   2100                                                                 - - aaaaccactt atcccggacc tgcctattga ttgctgtccg tacaaacagg gc -            #cagggaca   2160                                                                 - - tccgccgata cctagagacc attctagaga tccaagcatt tcccttttat aa -            #tactgtct   2220                                                                 - - gtatattacg gaagctcttt tcacataata tgactcgtct gcgaaagttt at -            #ggtttact   2280                                                                 - - ttggaaagaa ccaaagtttg cagaagatac agaaaactcc tctctttgtg gc -            #ggcgatct   2340                                                                 - - gtgctcattg gtttcagtat ccttttgacc catcctttga tgatgtggct gt -            #tttcaagt   2400                                                                 - - cctatatgga acgcctttcc ttaaggaaca aagcgacagc tgaaattctc aa -            #agcaactg   2460                                                                 - - tgtcctcctg tggtgagctg gccttgaaag ggtttttttc atgttgcttt ga -            #gtttaatg   2520                                                                 - - atgatgatct cgcagaagca ggggttgatg aagatgaaga tctaaccatg tg -            #cttgatga   2580                                                                 - - gcaaatttac agcccagaga ctaagaccat tctaccggtt tttaagtcct gc -            #cttccaag   2640                                                                 - - aatttcttgc ggggatgagg ctgattgaac tcctggattc agataggcag ga -            #acatcaag   2700                                                                 - - atttgggact gtatcatttg aaacaaatca actcacccat gatgactgta ag -            #cgcctaca   2760                                                                 - - acaatttttt gaactatgtc tccagcctcc cttcaacaaa agcagggccc aa -            #aattgtgt   2820                                                                 - - ctcatttgct ccatttagtg gataacaaag agtcattgga gaatatatct ga -            #aaatgatg   2880                                                                 - - actacttaaa gcaccagcca gaaatttcac tgcagatgca gttacttagg gg -            #attgtggc   2940                                                                 - - aaatttgtcc acaagcttac ttttcaatgg tttcagaaca tttactggtt ct -            #tgccctga   3000                                                                 - - aaactgctta tcaaagcaac actgttgctg cgtgttctcc atttgttttg ca -            #attccttc   3060                                                                 - - aagggagaac actgactttg ggtgcgctta acttacagta ctttttcgac ca -            #cccagaaa   3120                                                                 - - gcttgtcatt gttgaggagc atccacttct caatacgagg aaataagaca tc -            #acccagag   3180                                                                 - - cacatttttc agttctggaa acatgttttg acaaatcaca ggtgccaact at -            #agatcagg   3240                                                                 - - actatgcttc tgcctttgaa cctatgaatg aatgggagcg aaatttagct ga -            #aaaagagg   3300                                                                 - - ataatgtaaa gagctatatg gatatgcagc gcagggcatc accagacctt ag -            #tactggct   3360                                                                 - - attggaaact ttctccaaag cagtacaaga ttccctgtct agaagtcgat gt -            #gaatgata   3420                                                                 - - ttgatgttgt aggccaggat atgcttgaga ttctaatgac agttttctca gc -            #ttcacagc   3480                                                                 - - gcatcgaact ccatttaaac cacagcagag gctttataga aagcatccgc cc -            #agctcttg   3540                                                                 - - agctgtctaa ggcctctgtc accaagtgct ccataagcaa gttggaactc ag -            #cgcagccg   3600                                                                 - - aacaggaact gcttctcacc ctgccttccc tggaatctct tgaagtctca gg -            #gacaatcc   3660                                                                 - - agtcacaaga ccaaatcttt cctaatctgg ataagttcct gtgcctgaaa ga -            #actgtctg   3720                                                                 - - tggatctgga gggcaatata aatgtttttt cagtcattcc tgaagaattt cc -            #aaacttcc   3780                                                                 - - accatatgga gaaattattg atccaaattt cagctgagta tgatccttcc aa -            #actagttg   3840                                                                 - - ccagtttgcc aaattttatt tctctgaaga tattaaatct tgaaggccag ca -            #atttcctg   3900                                                                 - - atgaggaaac atcagaaaaa tttgcctaca ttttaggttc tcttagtaac ct -            #ggaagaat   3960                                                                 - - tgatccttcc tactggggat ggaatttatc gagtggccaa actgatcatc ca -            #gcagtgtc   4020                                                                 - - agcagcttca ttgtctccga gtcctctcat ttttcaagac tttgaatgat ga -            #cagcgtgg   4080                                                                 - - tggaaattgg ttaaaaatgt gtctgcaggc acacaggacg tgccttcacc cc -            #catctgac   4140                                                                 - - tatgtggaaa gagttgacag tcccatggca tactcttcca atggcaaagt ga -            #atgacaag   4200                                                                 - - cggttttatc cagagtcttc ctataaatcc acgccggttc ctgaagtggt tc -            #aggagctt   4260                                                                 - - ccattaactt cgcctgtgga tgacttcagg cagcctcgtt acagcagcgg tg -            #gtaacttt   4320                                                                 - - gagacacctt caaaaagagc acctgcaaag ggaagagcag gaaggtcaaa ga -            #gaacagag   4380                                                                 - - caagatcact atgagacaga ctacacaact ggcggcgagt cctgtgatga gc -            #tggaggag   4440                                                                 - - gactggatca gggaatatcc acctatcact tcagatcaac aaagacaact gt -            #acaagagg   4500                                                                 - - aattttgaca ctggcctaca ggaatacaag agcttacaat cagaacttga tg -            #agatcaat   4560                                                                 - - aaagaactct cccgtttgga taaagaattg gatgactata gagaagaaag tg -            #aagagtac   4620                                                                 - - atggctgctg ctgatgaata caatagactg aagcaagtga agggatctgc ag -            #attacaaa   4680                                                                 - - agtaagaaga atcattgcaa gcagttaaac agcaaattgt cacacatcaa ga -            #agatggtt   4740                                                                 - - ggagactatg atagacagaa aacatagaag gctgatgcca agttgtttga ga -            #aattaagt   4800                                                                 - - atctgacatc tctgcaatct tctcagaagg caaatgactt tggaccataa cc -            #ccggaagc   4860                                                                 - - caaacctctg tgagcatcac agttttggtt gctttaatat catcagtatt ga -            #agcatttt   4920                                                                 - - ataaatcgct tttgataatc aactgggctg aacactccaa ttaaggattt ta -            #tgctttaa   4980                                                                 - - acattggttc ttgtattaag aatgaaatac tgtttgaggt ttttaagcct ta -            #aaggaagg   5040                                                                 - - ttctggtgtg aactaaactt tcacacccca gacgatgtct tcatacctac at -            #gtatttgt   5100                                                                 - - ttgcataggt gatctcattt aatcctctca accacctttc agataactgt ta -            #tttataat   5160                                                                 - - cacttttttc cacataagga aactgggttc ctgcaatgaa gtctctgaag tg -            #aaactgct   5220                                                                 - - tgtttcctag cacacacttt tggttaagtc tgttttatga cttcattaat aa -            #taaattcc   5280                                                                 - - ggcatcatac agctactcct ccctaccgcc acctccacag acaccactct cc -            #tggttcca   5340                                                                 - - tctcctctgc tgcttctagc tccctgctct ggcttcaagg tgcgcaggac ct -            #gcttcctt   5400                                                                 - - ggtgatcctc tgtagtctcc cacaccccac attatctaca aactgatgac tc -            #ctaattta   5460                                                                 - - catctccagc tcagacctct ccatcaatcc caacgcatac ac    - #                      - #5502                                                                     - -  - - <210> SEQ ID NO 2                                                   <211> LENGTH: 1232                                                            <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 2                                                         - - Met Ala Thr Gln Gln Lys Ala Ser Asp Glu Ar - #g Ile Ser Gln Phe Asp        1               5 - #                 10 - #                 15              - - His Asn Leu Leu Pro Glu Leu Ser Ala Leu Le - #u Gly Leu Asp Ala Val                   20     - #             25     - #             30                  - - Gln Leu Ala Lys Glu Leu Glu Glu Glu Glu Gl - #n Lys Glu Arg Ala Lys               35         - #         40         - #         45                      - - Met Gln Lys Gly Tyr Asn Ser Gln Met Arg Se - #r Glu Ala Lys Arg Leu           50             - #     55             - #     60                          - - Lys Thr Phe Val Thr Tyr Glu Pro Tyr Ser Se - #r Trp Ile Pro Gln Glu       65                 - # 70                 - # 75                 - # 80       - - Met Ala Ala Ala Gly Phe Tyr Phe Thr Gly Va - #l Lys Ser Gly Ile Gln                       85 - #                 90 - #                 95              - - Cys Phe Cys Cys Ser Leu Ile Leu Phe Gly Al - #a Gly Leu Thr Arg Leu                  100      - #           105      - #           110                  - - Pro Ile Glu Asp His Lys Arg Phe His Pro As - #p Cys Gly Phe Leu Leu              115          - #       120          - #       125                      - - Asn Lys Asp Val Gly Asn Ile Ala Lys Tyr As - #p Ile Arg Val Lys Asn          130              - #   135              - #   140                          - - Leu Lys Ser Arg Leu Arg Gly Gly Lys Met Ar - #g Tyr Gln Glu Glu Glu      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Ala Arg Leu Ala Ser Phe Arg Asn Trp Pro Ph - #e Tyr Val Gln Gly        Ile                                                                                             165  - #               170  - #               175             - - Ser Pro Cys Val Leu Ser Glu Ala Gly Phe Va - #l Phe Thr Gly Lys Gln                  180      - #           185      - #           190                  - - Asp Thr Val Gln Cys Phe Ser Cys Gly Gly Cy - #s Leu Gly Asn Trp Glu              195          - #       200          - #       205                      - - Glu Gly Asp Asp Pro Trp Lys Glu His Ala Ly - #s Trp Phe Pro Lys Cys          210              - #   215              - #   220                          - - Glu Phe Leu Arg Ser Lys Lys Ser Ser Glu Gl - #u Ile Thr Gln Tyr Ile      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Gln Ser Tyr Lys Gly Phe Val Asp Ile Thr Gl - #y Glu His Phe Val        Asn                                                                                             245  - #               250  - #               255             - - Ser Trp Val Gln Arg Glu Leu Pro Met Ala Se - #r Ala Tyr Cys Asn Asp                  260      - #           265      - #           270                  - - Ser Ile Phe Ala Tyr Glu Glu Leu Arg Leu As - #p Ser Phe Lys Asp Trp              275          - #       280          - #       285                      - - Pro Arg Glu Ser Ala Val Gly Val Ala Ala Le - #u Ala Lys Ala Gly Leu          290              - #   295              - #   300                          - - Phe Tyr Thr Gly Ile Lys Asp Ile Val Gln Cy - #s Phe Ser Cys Gly Gly      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Cys Leu Glu Lys Trp Gln Glu Gly Asp Asp Pr - #o Leu Asp Asp His        Thr                                                                                             325  - #               330  - #               335             - - Arg Cys Phe Pro Asn Cys Pro Phe Leu Gln As - #n Met Lys Ser Ser Ala                  340      - #           345      - #           350                  - - Glu Val Thr Pro Asp Leu Gln Ser Arg Gly Gl - #u Leu Cys Glu Leu Leu              355          - #       360          - #       365                      - - Glu Thr Thr Ser Glu Ser Asn Leu Glu Asp Se - #r Ile Ala Val Gly Pro          370              - #   375              - #   380                          - - Ile Val Pro Glu Met Ala Gln Gly Glu Ala Gl - #n Trp Phe Gln Glu Ala      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Lys Asn Leu Asn Glu Gln Leu Arg Ala Ala Ty - #r Thr Ser Ala Ser        Phe                                                                                             405  - #               410  - #               415             - - Arg His Met Ser Leu Leu Asp Ile Ser Ser As - #p Leu Ala Thr Asp His                  420      - #           425      - #           430                  - - Leu Leu Gly Cys Asp Leu Ser Ile Ala Ser Ly - #s His Ile Ser Lys Pro              435          - #       440          - #       445                      - - Val Gln Glu Pro Leu Val Leu Pro Glu Val Ph - #e Gly Asn Leu Asn Ser          450              - #   455              - #   460                          - - Val Met Cys Val Glu Gly Glu Ala Gly Ser Gl - #y Lys Thr Val Leu Leu      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Lys Lys Ile Ala Phe Leu Trp Ala Ser Gly Cy - #s Cys Pro Leu Leu        Asn                                                                                             485  - #               490  - #               495             - - Arg Phe Gln Leu Val Phe Tyr Leu Ser Leu Se - #r Ser Thr Arg Pro Asp                  500      - #           505      - #           510                  - - Glu Gly Leu Ala Ser Ile Ile Cys Asp Gln Le - #u Leu Glu Lys Glu Gly              515          - #       520          - #       525                      - - Ser Val Thr Glu Met Cys Met Arg Asn Ile Il - #e Gln Gln Leu Lys Asn          530              - #   535              - #   540                          - - Gln Val Leu Phe Leu Leu Asp Asp Tyr Lys Gl - #u Ile Cys Ser Ile Pro      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Gln Val Ile Gly Lys Leu Ile Gln Lys Asn Hi - #s Leu Ser Arg Thr        Cys                                                                                             565  - #               570  - #               575             - - Leu Leu Ile Ala Val Arg Thr Asn Arg Ala Ar - #g Asp Ile Arg Arg Tyr                  580      - #           585      - #           590                  - - Leu Glu Thr Ile Leu Glu Ile Gln Ala Phe Pr - #o Phe Tyr Asn Thr Val              595          - #       600          - #       605                      - - Cys Ile Leu Arg Lys Leu Phe Ser His Asn Me - #t Thr Arg Leu Arg Lys          610              - #   615              - #   620                          - - Phe Met Val Tyr Phe Gly Lys Asn Gln Ser Le - #u Gln Lys Ile Gln Lys      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Thr Pro Leu Phe Val Ala Ala Ile Cys Ala Hi - #s Trp Phe Gln Tyr        Pro                                                                                             645  - #               650  - #               655             - - Phe Asp Pro Ser Phe Asp Asp Val Ala Val Ph - #e Lys Ser Tyr Met Glu                  660      - #           665      - #           670                  - - Arg Leu Ser Leu Arg Asn Lys Ala Thr Ala Gl - #u Ile Leu Lys Ala Thr              675          - #       680          - #       685                      - - Val Ser Ser Cys Gly Glu Leu Ala Leu Lys Gl - #y Phe Phe Ser Cys Cys          690              - #   695              - #   700                          - - Phe Glu Phe Asn Asp Asp Asp Leu Ala Glu Al - #a Gly Val Asp Glu Asp      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Glu Asp Leu Thr Met Cys Leu Met Ser Lys Ph - #e Thr Ala Gln Arg        Leu                                                                                             725  - #               730  - #               735             - - Arg Pro Phe Tyr Arg Phe Leu Ser Pro Ala Ph - #e Gln Glu Phe Leu Ala                  740      - #           745      - #           750                  - - Gly Met Arg Leu Ile Glu Leu Leu Asp Ser As - #p Arg Gln Glu His Gln              755          - #       760          - #       765                      - - Asp Leu Gly Leu Tyr His Leu Lys Gln Ile As - #n Ser Pro Met Met Thr          770              - #   775              - #   780                          - - Val Ser Ala Tyr Asn Asn Phe Leu Asn Tyr Va - #l Ser Ser Leu Pro Ser      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Thr Lys Ala Gly Pro Lys Ile Val Ser His Le - #u Leu His Leu Val        Asp                                                                                             805  - #               810  - #               815             - - Asn Lys Glu Ser Leu Glu Asn Ile Ser Glu As - #n Asp Asp Tyr Leu Lys                  820      - #           825      - #           830                  - - His Gln Pro Glu Ile Ser Leu Gln Met Gln Le - #u Leu Arg Gly Leu Trp              835          - #       840          - #       845                      - - Gln Ile Cys Pro Gln Ala Tyr Phe Ser Met Va - #l Ser Glu His Leu Leu          850              - #   855              - #   860                          - - Val Leu Ala Leu Lys Thr Ala Tyr Gln Ser As - #n Thr Val Ala Ala Cys      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Ser Pro Phe Val Leu Gln Phe Leu Gln Gly Ar - #g Thr Leu Thr Leu        Gly                                                                                             885  - #               890  - #               895             - - Ala Leu Asn Leu Gln Tyr Phe Phe Asp His Pr - #o Glu Ser Leu Ser Leu                  900      - #           905      - #           910                  - - Leu Arg Ser Ile His Phe Ser Ile Arg Gly As - #n Lys Thr Ser Pro Arg              915          - #       920          - #       925                      - - Ala His Phe Ser Val Leu Glu Thr Cys Phe As - #p Lys Ser Gln Val Pro          930              - #   935              - #   940                          - - Thr Ile Asp Gln Asp Tyr Ala Ser Ala Phe Gl - #u Pro Met Asn Glu Trp      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Glu Arg Asn Leu Ala Glu Lys Glu Asp Asn Va - #l Lys Ser Tyr Met        Asp                                                                                             965  - #               970  - #               975             - - Met Gln Arg Arg Ala Ser Pro Asp Leu Ser Th - #r Gly Tyr Trp Lys Leu                  980      - #           985      - #           990                  - - Ser Pro Lys Gln Tyr Lys Ile Pro Cys Leu Gl - #u Val Asp Val Asn Asp              995          - #      1000           - #     1005                      - - Ile Asp Val Val Gly Gln Asp Met Leu Glu Il - #e Leu Met Thr Val Phe         1010              - #  1015               - # 1020                          - - Ser Ala Ser Gln Arg Ile Glu Leu His Leu As - #n His Ser Arg Gly Phe      1025               1030 - #               1035  - #              1040          - - Ile Glu Ser Ile Arg Pro Ala Leu Glu Leu Se - #r Lys Ala Ser Val Thr                     1045  - #              1050   - #             1055              - - Lys Cys Ser Ile Ser Lys Leu Glu Leu Ser Al - #a Ala Glu Gln Glu Leu                 1060      - #          1065       - #         1070                  - - Leu Leu Thr Leu Pro Ser Leu Glu Ser Leu Gl - #u Val Ser Gly Thr Ile             1075          - #      1080           - #     1085                      - - Gln Ser Gln Asp Gln Ile Phe Pro Asn Leu As - #p Lys Phe Leu Cys Leu         1090              - #  1095               - # 1100                          - - Lys Glu Leu Ser Val Asp Leu Glu Gly Asn Il - #e Asn Val Phe Ser Val      1105               1110 - #               1115  - #              1120          - - Ile Pro Glu Glu Phe Pro Asn Phe His His Me - #t Glu Lys Leu Leu Ile                     1125  - #              1130   - #             1135              - - Gln Ile Ser Ala Glu Tyr Asp Pro Ser Lys Le - #u Val Ala Ser Leu Pro                 1140      - #          1145       - #         1150                  - - Asn Phe Ile Ser Leu Lys Ile Leu Asn Leu Gl - #u Gly Gln Gln Phe Pro             1155          - #      1160           - #     1165                      - - Asp Glu Glu Thr Ser Glu Lys Phe Ala Tyr Il - #e Leu Gly Ser Leu Ser         1170              - #  1175               - # 1180                          - - Asn Leu Glu Glu Leu Ile Leu Pro Thr Gly As - #p Gly Ile Tyr Arg Val      1185               1190 - #               1195  - #              1200          - - Ala Lys Leu Ile Ile Gln Gln Cys Gln Gln Le - #u His Cys Leu Arg Val                     1205  - #              1210   - #             1215              - - Leu Ser Phe Phe Lys Thr Leu Asn Asp Asp Se - #r Val Val Glu Ile Gly                 1220      - #          1225       - #         1230                  - -  - - <210> SEQ ID NO 3                                                   <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 3                                                         - - atgcttggat ctctagaatg g           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 4                                                   <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 4                                                         - - agcaaagaca tgtggcggaa            - #                  - #                      - # 20                                                                  - -  - - <210> SEQ ID NO 5                                                   <211> LENGTH: 22                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 5                                                         - - ccagctccta gagaaagaag ga           - #                  - #                     22                                                                     - -  - - <210> SEQ ID NO 6                                                   <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 6                                                         - - gaactacggc tggactcttt t           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 7                                                   <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 7                                                         - - ctctcagcct gctcttcaga t           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 8                                                   <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 8                                                         - - aaagcctctg acgagaggat c           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 9                                                   <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 9                                                         - - cgactgcctg ttcatctacg a           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 10                                                  <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 10                                                        - - tttgttctcc agccacatac t           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 11                                                  <211> LENGTH: 22                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 11                                                        - - catttggcat gttccttcca ag           - #                  - #                     22                                                                     - -  - - <210> SEQ ID NO 12                                                  <211> LENGTH: 27                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 12                                                        - - gtagatgaat actgatgttt cataatt          - #                  - #                 27                                                                     - -  - - <210> SEQ ID NO 13                                                  <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 13                                                        - - tgccactgcc aggcaatcta a           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 14                                                  <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 14                                                        - - taaacaggac acggtacagt g           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 15                                                  <211> LENGTH: 24                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 15                                                        - - catgttttaa gtctcggtgc tctg          - #                  - #                    24                                                                     - -  - - <210> SEQ ID NO 16                                                  <211> LENGTH: 23                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 16                                                        - - ttagccagat gtgttggcac atg           - #                  - #                    23                                                                     - -  - - <210> SEQ ID NO 17                                                  <211> LENGTH: 22                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 17                                                        - - gattctatgt gataggcagc ca           - #                  - #                     22                                                                     - -  - - <210> SEQ ID NO 18                                                  <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 18                                                        - - gccactgctc ccgatggatt a           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 19                                                  <211> LENGTH: 22                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 19                                                        - - gctctcagct gctcattcag at           - #                  - #                     22                                                                     - -  - - <210> SEQ ID NO 20                                                  <211> LENGTH: 21                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Artificial Sequence                                           <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 20                                                        - - acaaagttca ccacggctct g           - #                  - #                      - #21                                                                  - -  - - <210> SEQ ID NO 21                                                  <211> LENGTH: 275                                                             <212> TYPE: PRT                                                               <213> ORGANISM: Cydia pomonella                                               <220> FEATURE:                                                                <223> OTHER INFORMATION: Description of Artificial - #Sequence: PCR          primer                                                                           - - <400> SEQUENCE: 21                                                        - - Met Ser Asp Leu Arg Leu Glu Glu Val Arg Le - #u Asn Thr Phe Glu        Lys                                                                               1               5 - #                 10 - #                 15             - - Trp Pro Val Ser Phe Leu Ser Pro Glu Thr Me - #t Ala Lys Asn Gly Phe                   20     - #             25     - #             30                  - - Tyr Tyr Leu Gly Arg Ser Asp Glu Val Arg Cy - #s Ala Phe Cys Lys Val               35         - #         40         - #         45                      - - Glu Ile Met Arg Trp Lys Glu Gly Glu Asp Pr - #o Ala Ala Asp His Lys           50             - #     55             - #     60                          - - Lys Trp Ala Pro Gln Cys Pro Phe Val Lys Gl - #y Ile Asp Val Cys Gly       65                 - # 70                 - # 75                 - # 80       - - Ser Ile Val Thr Thr Asn Asn Ile Gln Asn Th - #r Thr Thr His Asp Thr                       85 - #                 90 - #                 95              - - Ile Ile Gly Pro Ala His Pro Lys Tyr Ala Hi - #s Glu Ala Ala Arg Val                  100      - #           105      - #           110                  - - Lys Ser Phe His Asn Trp Pro Arg Cys Met Ly - #s Gln Arg Pro Glu Gln              115          - #       120          - #       125                      - - Met Ala Asp Ala Gly Phe Phe Tyr Thr Gly Ty - #r Gly Asp Asn Thr Lys          130              - #   135              - #   140                          - - Cys Phe Tyr Cys Asp Gly Gly Leu Lys Asp Tr - #p Glu Pro Glu Asp Val      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Pro Trp Glu Gln His Val Arg Trp Phe Asp Ar - #g Cys Ala Tyr Val        Gln                                                                                             165  - #               170  - #               175             - - Leu Val Lys Gly Arg Asp Tyr Val Gln Lys Va - #l Ile Thr Glu Ala Cys                  180      - #           185      - #           190                  - - Val Leu Pro Gly Glu Asn Thr Thr Val Ser Th - #r Ala Ala Pro Val Ser              195          - #       200          - #       205                      - - Glu Pro Ile Pro Glu Thr Lys Ile Glu Lys Gl - #u Pro Gln Val Glu Asp          210              - #   215              - #   220                          - - Ser Lys Leu Cys Lys Ile Cys Tyr Val Glu Gl - #u Cys Ile Val Cys Phe      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Val Pro Cys Gly His Val Val Ala Cys Ala Ly - #s Cys Ala Leu Ser        Val                                                                                             245  - #               250  - #               255             - - Asp Lys Cys Pro Met Cys Arg Lys Ile Val Th - #r Ser Val Leu Lys Val                  260      - #           265      - #           270                  - - Tyr Phe Ser                                                                      275                                                                    - -  - - <210> SEQ ID NO 22                                                  <211> LENGTH: 268                                                             <212> TYPE: PRT                                                               <213> ORGANISM: Orgyia pseudotsugata                                           - - <400> SEQUENCE: 22                                                        - - Met Ser Ser Arg Ala Ile Gly Ala Pro Gln Gl - #u Gly Ala Asp Met Lys        1               5 - #                 10 - #                 15              - - Asn Lys Ala Ala Arg Leu Gly Thr Tyr Thr As - #n Trp Pro Val Gln Phe                   20     - #             25     - #             30                  - - Leu Glu Pro Ser Arg Met Ala Ala Ser Gly Ph - #e Tyr Tyr Leu Gly Arg               35         - #         40         - #         45                      - - Gly Asp Glu Val Arg Cys Ala Phe Cys Lys Va - #l Glu Ile Thr Asn Trp           50             - #     55             - #     60                          - - Val Arg Gly Asp Asp Pro Glu Thr Asp His Ly - #s Arg Trp Ala Pro Gln       65                 - # 70                 - # 75                 - # 80       - - Cys Pro Phe Val Arg Asn Asn Ala His Asp Th - #r Pro His Asp Arg Ala                       85 - #                 90 - #                 95              - - Pro Pro Ala Arg Ser Ala Ala Ala His Pro Gl - #n Tyr Ala Thr Glu Ala                  100      - #           105      - #           110                  - - Ala Arg Leu Arg Thr Phe Ala Glu Trp Pro Ar - #g Gly Leu Lys Gln Arg              115          - #       120          - #       125                      - - Pro Glu Glu Leu Ala Glu Ala Gly Phe Phe Ty - #r Thr Gly Gln Gly Asp          130              - #   135              - #   140                          - - Lys Thr Arg Cys Phe Cys Cys Asp Gly Gly Le - #u Lys Asp Trp Glu Pro      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Asp Asp Ala Pro Trp Gln Gln His Ala Arg Tr - #p Tyr Asp Arg Cys        Glu                                                                                             165  - #               170  - #               175             - - Tyr Val Leu Leu Val Lys Gly Arg Asp Phe Va - #l Gln Arg Val Met Thr                  180      - #           185      - #           190                  - - Glu Ala Cys Val Val Arg Asp Ala Asp Asn Gl - #u Pro His Ile Glu Arg              195          - #       200          - #       205                      - - Pro Ala Val Glu Ala Glu Val Ala Asp Asp Ar - #g Leu Cys Lys Ile Cys          210              - #   215              - #   220                          - - Leu Gly Ala Glu Lys Thr Val Cys Phe Val Pr - #o Cys Gly His Val Val      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Ala Cys Gly Lys Cys Ala Ala Gly Val Thr Th - #r Cys Pro Val Cys        Arg                                                                                             245  - #               250  - #               255             - - Gly Gln Leu Asp Lys Ala Val Arg Met Tyr Gl - #n Val                                  260      - #           265                                         - -  - - <210> SEQ ID NO 23                                                  <211> LENGTH: 1151                                                            <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 23                                                        - - Thr Arg Thr Val Asp Lys Pro Gln Lys Met Al - #a Thr Gln Gln Lys Ala        1               5 - #                 10 - #                 15              - - Ser Asp Glu Arg Ile Ser Gln Phe Asp His As - #n Leu Leu Pro Glu Leu                   20     - #             25     - #             30                  - - Ser Ala Leu Leu Gly Leu Asp Ala Val Gln Le - #u Ala Lys Glu Leu Glu               35         - #         40         - #         45                      - - Glu Glu Glu Gln Lys Glu Arg Ala Lys Met Gl - #n Lys Gly Tyr Asn Ser           50             - #     55             - #     60                          - - Gln Met Arg Ser Glu Ala Lys Arg Leu Lys Th - #r Phe Val Thr Tyr Glu       65                 - # 70                 - # 75                 - # 80       - - Pro Tyr Ser Ser Trp Ile Pro Gln Glu Met Al - #a Ala Ala Gly Phe Tyr                       85 - #                 90 - #                 95              - - Phe Thr Gly Val Lys Ser Gly Ile Gln Cys Ph - #e Cys Cys Ser Leu Ile                  100      - #           105      - #           110                  - - Leu Phe Gly Ala Gly Leu Thr Arg Leu Pro Il - #e Glu Asp His Lys Arg              115          - #       120          - #       125                      - - Phe His Pro Asp Cys Gly Phe Leu Leu Asn Ly - #s Asp Val Gly Asn Ile          130              - #   135              - #   140                          - - Ala Lys Tyr Asp Ile Arg Val Lys Asn Leu Ly - #s Ser Arg Leu Arg Gly      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Gly Lys Met Arg Tyr Gln Glu Glu Glu Ala Ar - #g Leu Ala Ser Phe        Arg                                                                                             165  - #               170  - #               175             - - Asn Trp Pro Phe Tyr Val Gln Gly Ile Ser Pr - #o Cys Val Leu Ser Glu                  180      - #           185      - #           190                  - - Ala Gly Phe Val Phe Thr Gly Lys Gln Asp Th - #r Val Gln Cys Phe Ser              195          - #       200          - #       205                      - - Cys Gly Gly Cys Leu Gly Asn Trp Glu Glu Gl - #y Asp Asp Pro Trp Lys          210              - #   215              - #   220                          - - Glu His Ala Lys Trp Phe Pro Lys Cys Glu Ph - #e Leu Arg Ser Lys Lys      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Ser Ser Glu Glu Ile Thr Gln Tyr Ile Gln Se - #r Tyr Lys Gly Phe        Val                                                                                             245  - #               250  - #               255             - - Asp Ile Thr Gly Glu His Phe Val Asn Ser Tr - #p Val Gln Arg Glu Leu                  260      - #           265      - #           270                  - - Pro Met Ala Ser Ala Tyr Cys Asn Asp Ser Il - #e Phe Ala Tyr Glu Glu              275          - #       280          - #       285                      - - Leu Arg Leu Asp Ser Phe Lys Asp Trp Pro Ar - #g Glu Ser Ala Val Gly          290              - #   295              - #   300                          - - Val Ala Ala Leu Ala Lys Ala Gly Leu Phe Ty - #r Thr Gly Ile Lys Asp      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Ile Val Gln Cys Phe Ser Cys Gly Gly Cys Le - #u Glu Lys Trp Gln        Glu                                                                                             325  - #               330  - #               335             - - Gly Asp Asp Pro Leu Asp Asp His Thr Arg Cy - #s Phe Pro Asn Cys Pro                  340      - #           345      - #           350                  - - Phe Leu Gln Asn Met Lys Ser Ser Ala Glu Va - #l Thr Pro Asp Leu Gln              355          - #       360          - #       365                      - - Ser Arg Gly Glu Leu Cys Glu Leu Leu Glu Th - #r Thr Ser Glu Ser Asn          370              - #   375              - #   380                          - - Leu Glu Asp Ser Ile Ala Val Gly Pro Ile Va - #l Pro Glu Met Ala Gln      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Gly Glu Ala Gln Trp Phe Gln Glu Ala Lys As - #n Leu Asn Glu Gln        Leu                                                                                             405  - #               410  - #               415             - - Arg Ala Ala Tyr Thr Ser Ala Ser Phe Arg Hi - #s Met Ser Leu Leu Asp                  420      - #           425      - #           430                  - - Ile Ser Ser Asp Leu Ala Thr Asp His Leu Le - #u Gly Cys Asp Leu Ser              435          - #       440          - #       445                      - - Ile Ala Ser Lys His Ile Ser Lys Pro Val Gl - #n Glu Pro Leu Val Leu          450              - #   455              - #   460                          - - Pro Glu Val Phe Gly Asn Leu Asn Ser Val Me - #t Cys Val Glu Gly Glu      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Ala Gly Ser Gly Lys Thr Val Leu Leu Lys Ly - #s Ile Ala Phe Leu        Trp                                                                                             485  - #               490  - #               495             - - Ala Ser Gly Cys Cys Pro Leu Leu Asn Arg Ph - #e Gln Leu Val Phe Tyr                  500      - #           505      - #           510                  - - Leu Ser Leu Ser Ser Thr Arg Pro Asp Glu Gl - #y Leu Ala Ser Ile Ile              515          - #       520          - #       525                      - - Cys Asp Gln Leu Leu Glu Lys Glu Gly Ser Va - #l Thr Glu Met Cys Met          530              - #   535              - #   540                          - - Arg Asn Ile Ile Gln Gln Leu Lys Asn Gln Va - #l Leu Phe Leu Leu Asp      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Asp Tyr Lys Glu Ile Cys Ser Ile Pro Gln Va - #l Ile Gly Lys Leu        Ile                                                                                             565  - #               570  - #               575             - - Gln Lys Asn His Leu Ser Arg Thr Cys Leu Le - #u Ile Ala Val Arg Thr                  580      - #           585      - #           590                  - - Asn Arg Ala Arg Asp Ile Arg Arg Tyr Leu Gl - #u Thr Ile Leu Glu Ile              595          - #       600          - #       605                      - - Gln Ala Phe Pro Phe Tyr Asn Thr Val Cys Il - #e Leu Arg Lys Leu Phe          610              - #   615              - #   620                          - - Ser His Asn Met Thr Arg Leu Arg Lys Phe Me - #t Val Tyr Phe Gly Lys      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Asn Gln Ser Leu Gln Lys Ile Gln Lys Thr Pr - #o Leu Phe Val Ala        Ala                                                                                             645  - #               650  - #               655             - - Ile Cys Ala His Trp Phe Gln Tyr Pro Phe As - #p Pro Ser Phe Asp Asp                  660      - #           665      - #           670                  - - Val Ala Val Phe Lys Ser Tyr Met Glu Arg Le - #u Ser Leu Arg Asn Lys              675          - #       680          - #       685                      - - Ala Thr Ala Glu Ile Leu Lys Ala Thr Val Se - #r Ser Cys Gly Glu Leu          690              - #   695              - #   700                          - - Ala Leu Lys Gly Phe Phe Ser Cys Cys Phe Gl - #u Phe Asn Asp Asp Asp      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Leu Ala Glu Ala Gly Val Asp Glu Asp Glu As - #p Leu Thr Met Cys        Leu                                                                                             725  - #               730  - #               735             - - Met Ser Lys Phe Thr Ala Gln Arg Leu Arg Pr - #o Phe Tyr Arg Phe Leu                  740      - #           745      - #           750                  - - Ser Pro Ala Phe Gln Glu Phe Leu Ala Gly Me - #t Arg Leu Ile Glu Leu              755          - #       760          - #       765                      - - Leu Asp Ser Asp Arg Gln Glu His Gln Asp Le - #u Gly Leu Tyr His Leu          770              - #   775              - #   780                          - - Lys Gln Ile Asn Ser Pro Met Met Thr Val Se - #r Ala Tyr Asn Asn Phe      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Leu Asn Tyr Val Ser Ser Leu Pro Ser Thr Ly - #s Ala Gly Pro Lys        Ile                                                                                             805  - #               810  - #               815             - - Val Ser His Leu Leu His Leu Val Asp Asn Ly - #s Glu Ser Leu Glu Asn                  820      - #           825      - #           830                  - - Ile Ser Glu Asn Asp Asp Tyr Leu Lys His Gl - #n Pro Glu Ile Ser Leu              835          - #       840          - #       845                      - - Gln Met Gln Leu Leu Arg Gly Leu Trp Gln Il - #e Cys Pro Gln Ala Tyr          850              - #   855              - #   860                          - - Phe Ser Met Val Ser Glu His Leu Leu Val Le - #u Ala Leu Lys Thr Ala      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Tyr Gln Ser Asn Thr Val Ala Ala Cys Ser Pr - #o Phe Val Leu Gln        Phe                                                                                             885  - #               890  - #               895             - - Leu Gln Gly Arg Thr Leu Thr Leu Gly Ala Le - #u Asn Leu Gln Tyr Phe                  900      - #           905      - #           910                  - - Phe Asp His Pro Glu Ser Leu Ser Leu Leu Ar - #g Ser Ile His Phe Ser              915          - #       920          - #       925                      - - Ile Arg Gly Asn Lys Thr Ser Pro Arg Ala Hi - #s Phe Ser Val Leu Glu          930              - #   935              - #   940                          - - Thr Cys Phe Asp Lys Ser Gln Val Pro Thr Il - #e Asp Gln Asp Tyr Ala      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Ser Ala Phe Glu Pro Met Asn Glu Trp Glu Ar - #g Asn Leu Ala Glu        Lys                                                                                             965  - #               970  - #               975             - - Glu Asp Asn Val Lys Ser Tyr Met Asp Met Gl - #n Arg Arg Ala Ser Pro                  980      - #           985      - #           990                  - - Asp Leu Ser Thr Gly Tyr Trp Lys Leu Ser Pr - #o Lys Gln Tyr Lys Ile              995          - #      1000           - #     1005                      - - Pro Cys Leu Glu Val Asp Val Asn Asp Ile As - #p Val Val Gly Gln Asp         1010              - #  1015               - # 1020                          - - Met Leu Glu Ile Leu Met Thr Val Phe Ser Al - #a Ser Gln Arg Ile Glu      1025               1030 - #               1035  - #              1040          - - Leu His Leu Asn His Ser Arg Gly Phe Ile Gl - #u Ser Ile Arg Pro Ala                     1045  - #              1050   - #             1055              - - Leu Glu Leu Ser Lys Ala Ser Val Thr Lys Cy - #s Ser Ile Ser Lys Leu                 1060      - #          1065       - #         1070                  - - Glu Leu Ser Ala Ala Glu Gln Glu Leu Leu Le - #u Thr Leu Pro Ser Leu             1075          - #      1080           - #     1085                      - - Glu Ser Leu Glu Val Ser Gly Thr Ile Gln Se - #r Gln Asp Gln Ile Phe         1090              - #  1095               - # 1100                          - - Pro Asn Leu Asp Lys Phe Leu Cys Leu Lys Gl - #u Leu Ser Val Asp Leu      1105               1110 - #               1115  - #              1120          - - Glu Gly Asn Ile Asn Val Phe Ser Val Ile Pr - #o Glu Glu Phe Pro Asn                     1125  - #              1130   - #             1135              - - Phe His His Met Glu Lys Leu Leu Ile Gln Il - #e Ser Ala Glu Ser                     1140      - #          1145       - #         1150                __________________________________________________________________________

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
 1. A human gene isolated from the Spinal Muscular Atrophy containing region of human chromosome 5q13, said gene spanning approximately 80 kb of genomic DNA and comprising a cDNA sequence having exons 1 to 17 which encodes neuronal apoptosis inhibitor protein.
 2. The human gene as claimed in claim 1, wherein a deletion in at least one of said exons is causative of type I, II and III spinal muscular atrophy.
 3. The human gene as claimed in claim 2, wherein a deletion of exons 5 or 6 is causative of spinal muscular atrophy.
 4. The human gene as claimed in claim 3, wherein exons 5 and 6 are identified by a Notl restriction site dissecting exon
 6. 5. The human gene as claimed in claim 1, wherein said cDNA comprises 17 exons of at least 5.5 kb and having a restriction map for exons 2 through 11 of FIG.
 8. 6. The human gene as claimed in claim 1, wherein said cDNA comprises 17 exons of at least 5.5 kb and having a restriction map for exons 2 through 16 of FIG. 9D.
 7. A purified nucleotide sequence wherein said sequence comprising genomic DNA, cDNA, antisense DNA, homologous DNA, or mRNA, is selected from at least one of the exons of the cDNA sequence SEQ ID NO:1, shown in Table
 4. 8. A purified nucleotide sequence comprising at least 18 sequential nucleotides which hybridizes to at least one nucleotide sequence of claim
 7. 9. The purified nucleotide sequence comprising at least 18 sequential nucleotides which is complementary to the nucleotide sequence of claim
 7. 10. The purified nucleotide sequence of claim 7, wherein said sequence comprises the nucleotide sequence of any one of exons 1 to 16 of Table
 4. 11. A nucleotide probe, said probe comprising a sequence of at least 18 sequential nucleotides of the sequence of claim
 10. 12. The nucleotide probe as claimed in claim 11, wherein said probe comprises DNA or RNA.
 13. A PCR primer comprising a sequence of at least 18 sequential nucleotides of the sequence of claim
 7. 14. A cloning or expression vector containing the nucleotide sequence of claim
 10. 15. The human gene of claim 1, wherein said cDNA sequence of said gene hybridizes with the probe of claim
 11. 16. The human gene as claimed in claim 1, wherein exons 5 to 12 share significant homology with baculoviral inhibitor of apoptosis protein domains.
 17. The human gene as claimed in claim 1, wherein said gene has a 5' UTR having a perfect duplication of a 90 bp region before exon 2 and in the region bridging exons 2 and 3 of Table
 4. 18. Neuronal apoptosis inhibitor protein encoded by the human gene of claim
 1. 19. The protein of claim 18, wherein said protein has an inhibitor of apoptosis domain, two potential transmembrane regions bracketing the inhibitor of apoptosis domain and a contiguous GTP binding site.
 20. The protein as claimed in claim 18, wherein said protein is encoded by the nucleotide sequence of exons 1 to 16 of Table
 4. 21. The protein as claimed in claim 20, wherein amino acids 185-250 encoded by exons 6, 7 and 8 are 35% homologous to amino acids 300-370 encoded by exons 10, 11 and
 12. 22. A neuronal apoptosis inhibitor protein fragment of said protein of claim 18, said fragment comprising at least 15 sequential amino acids of SEQ ID NO:
 2. 23. A neuronal apoptosis inhibitor protein fragment of said protein of claim 18, said fragment comprising an amino acid sequence selected from the group of amino acid sequences encoded by exons 1 to 16 of Table
 4. 24. A neuronal apoptosis inhibitor protein fragment, comprising an amino acid sequence encoded by any one of exons 1 to 16 of the nucleotide sequence of claim
 10. 25. The neuronal apoptosis inhibitor protein fragment as claimed in claim 20, said protein fragment having a deletion of the amino acids encoded by exons 5 or 6 causative of spinal muscular atrophy.
 26. A monoclonal or polyclonal antibody which recognizes the protein as claimed in claim
 20. 27. A monoclonal or polyclonal antibody which recognizes the neuronal apoptosis inhibitor protein responsible for spinal muscular atrophy.
 28. A method for analyzing a biological sample to diagnose the presence or absence of at least a portion of the gene encoding neuronal apoptosis inhibitor protein or the protein, said method comprising:i) providing a biological sample containing the Spinal Muscular Atrophy region q13 of human chromosome 5 or the neuronal apoptosis inhibitor protein; ii) conducting a biological assay to determine the presence or absence in said biological sample of at least a member selected from the group consisting of:a) the nucleotide sequence of exons 1 to 16 of Table 4; and b) the amino acid sequence of exons 4 to 16 of Table
 4. 29. The method of claim 28 for diagnosing the risk for a human of developing spinal muscular atrophy, said method additionally comprising assaying for mutations in the sequences of step ii).
 30. The method of claim 27, wherein the presence or absence of exons 5 or 6 of group (a) are assayed.
 31. The method of claim 28, said method further comprising determining the intact gene copy number of chromosome 5 which encodes neuronal apoptosis inhibitor protein.
 32. The method of claim 28, wherein said biological assay comprises an assay selected from the group consisting of DNA hybridization, restriction enzyme digest, PCR amplification, mRNA detection and DNA sequencing.
 33. The method of claim 32, wherein said biological assay comprises PCR amplification of exon 5 and exon 6 using PCR primers selected from the 5' region of exon 5 and the 3' region of exon
 6. 